Abstract
Recent approaches in text-to-image (T2I) generation have actively adopted reinforcement learning (RL) techniques for human preference alignment. However, existing approaches primarily rely on a single reward function, which can lead to overfitting on specific metrics, resulting in issues such as reward hacking and imbalanced optimization among multiple objectives. To address this, we propose Flow-Multi: a flow-matching multi-reward framework for text-to-image generation. Our method builds upon flow-matching-based group-relative policy optimization (GRPO) learning. Each sample is evaluated by four reward models—based on text-to-image alignment, human preference, aesthetic quality, and GenEval—to create a multi-dimensional reward vector. We then utilize the Pareto dominance relationship to remove dominated samples and update the policy using only the non-dominated set. Additionally, we introduce advantage masking during training to suppress the contribution of low-reward samples, ensuring that only high-quality rewards are reflected in policy optimization. Experimental results demonstrate that Flow-Multi achieves balanced improvements across multiple reward criteria compared to the existing Flow-GRPO, validating the effectiveness of the multi-reward reinforcement learning framework for stable alignment in text-to-image generation.
| Original language | English |
|---|---|
| Article number | 1120 |
| Journal | Sensors |
| Volume | 26 |
| Issue number | 4 |
| DOIs | |
| State | Published - Feb 2026 |
Keywords
- flow matching
- multi-reward reinforcement learning
- text-to-image generation
Fingerprint
Dive into the research topics of 'Flow-Multi: A Flow-Matching Multi-Reward Framework for Text-to-Image Generation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver