Skip to main navigation Skip to search Skip to main content

Flow-Multi: A Flow-Matching Multi-Reward Framework for Text-to-Image Generation

  • Kyungpook National University

Research output: Contribution to journalArticlepeer-review

Abstract

Recent approaches in text-to-image (T2I) generation have actively adopted reinforcement learning (RL) techniques for human preference alignment. However, existing approaches primarily rely on a single reward function, which can lead to overfitting on specific metrics, resulting in issues such as reward hacking and imbalanced optimization among multiple objectives. To address this, we propose Flow-Multi: a flow-matching multi-reward framework for text-to-image generation. Our method builds upon flow-matching-based group-relative policy optimization (GRPO) learning. Each sample is evaluated by four reward models—based on text-to-image alignment, human preference, aesthetic quality, and GenEval—to create a multi-dimensional reward vector. We then utilize the Pareto dominance relationship to remove dominated samples and update the policy using only the non-dominated set. Additionally, we introduce advantage masking during training to suppress the contribution of low-reward samples, ensuring that only high-quality rewards are reflected in policy optimization. Experimental results demonstrate that Flow-Multi achieves balanced improvements across multiple reward criteria compared to the existing Flow-GRPO, validating the effectiveness of the multi-reward reinforcement learning framework for stable alignment in text-to-image generation.

Original languageEnglish
Article number1120
JournalSensors
Volume26
Issue number4
DOIs
StatePublished - Feb 2026

Keywords

  • flow matching
  • multi-reward reinforcement learning
  • text-to-image generation

Fingerprint

Dive into the research topics of 'Flow-Multi: A Flow-Matching Multi-Reward Framework for Text-to-Image Generation'. Together they form a unique fingerprint.

Cite this