Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training

Xinyan Chen, Jiaxin Ge, Tianjun Zhang, Jiaming Liu, Shanghang Zhang


Abstract
Diffusion models have shown impressive performance in many domains. However, the model’s capability to follow natural language instructions (e.g., spatial relationships between objects, generating complex scenes) is still unsatisfactory. In this work, we propose Iterative Prompt Relabeling (IPR), a novel algorithm that aligns images to text through iterative image sampling and prompt relabeling with feedback. IPR first samples a batch of images conditioned on the text, then relabels the text prompts of unmatched text-image pairs with classifier feedback. We conduct thorough experiments on SDv2 and SDXL, testing their capability to follow instructions on spatial relations. With IPR, we improved up to 15.22% (absolute improvement) on the challenging spatial relation VISOR benchmark, demonstrating superior performance compared to previous RL methods. Our code is publicly available at https://github.com/cxy000000/IPR-RLDF.
Anthology ID:
2024.findings-emnlp.165
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2937–2952
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.165
DOI:
Bibkey:
Cite (ACL):
Xinyan Chen, Jiaxin Ge, Tianjun Zhang, Jiaming Liu, and Shanghang Zhang. 2024. Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 2937–2952, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training (Chen et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.165.pdf