R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation

Kaijie Chen; Zihao Lin; Zhiyang Xu; Ying Shen; Yuguang Yao; Joy Rimchala; Jiaxin Zhang; Lifu Huang

R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation

Kaijie Chen, Zihao Lin, Zhiyang Xu, Ying Shen, Yuguang Yao, Joy Rimchala, Jiaxin Zhang, Lifu Huang

Abstract

Reasoning is a fundamental capability often required in real-world text-to-image (T2I) generation, e.g., generating “a bitten apple that has been left in the air for more than a week” necessitates understanding temporal decay and commonsense concepts. While recent T2I models have made impressive progress in producing photorealistic images, their reasoning capability remains underdeveloped and insufficiently evaluated. To bridge this gap, we introduce R2I-Bench, a comprehensive benchmark specifically designed to rigorously assess reasoning-driven T2I generation. R2I-Bench comprises 3068 meticulously curated data instances, spanning 7 core reasoning categories, including commonsense, mathematical, logical, compositional, numerical, causal, and concept mixing. To facilitate fine-grained evaluation, we design R2IScore, a QA-style metric based on instance-specific, reasoning-oriented evaluation questions that assess three critical dimensions: text-image alignment, reasoning accuracy, and image quality. Extensive experiments with 16 representative T2I models, including a strong pipeline-based framework that decouples reasoning and generation using the state-of-the-art language and image generation models, demonstrate consistently limited reasoning performance, highlighting the need for more robust, reasoning-aware architectures in the next generation of T2I systems.

Anthology ID:: 2025.emnlp-main.636
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12606–12641
Language:
URL:: https://aclanthology.org/2025.emnlp-main.636/
DOI:
Bibkey:
Cite (ACL):: Kaijie Chen, Zihao Lin, Zhiyang Xu, Ying Shen, Yuguang Yao, Joy Rimchala, Jiaxin Zhang, and Lifu Huang. 2025. R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12606–12641, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation (Chen et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.636.pdf
Checklist:: 2025.emnlp-main.636.checklist.pdf

PDF Cite Search Checklist Fix data