Generating Fine Details of Entity Interactions

Xinyi Gu; Jiayuan Mao

Generating Fine Details of Entity Interactions

Abstract

Recent text-to-image models excel at generating high-quality object-centric images from instructions. However, images should also encapsulate rich interactions between objects, where existing models often fall short, likely due to limited training data and benchmarks for rare interactions. This paper explores a novel application of Multimodal Large Language Models (MLLMs) to benchmark and enhance the generation of interaction-rich images.We introduce InterActing-1000, an interaction-focused dataset with 1000 LLM-generated fine-grained prompts for image generation covering (1) functional and action-based interactions, (2) multi-subject interactions, and (3) compositional spatial relationships.To address interaction-rich generation challenges, we propose a decomposition-augmented refinement procedure. Our approach, DetailScribe, leverages LLMs to decompose interactions into finer-grained concepts, uses an MLLM to critique generated images, and applies targeted refinements with a partial diffusion denoising process. Automatic and human evaluations show significantly improved image quality, demonstrating the potential of enhanced inference strategies. Our dataset and code are available at https://detailscribe.github.io/.

Anthology ID:: 2025.emnlp-industry.37
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 540–563
Language:
URL:: https://aclanthology.org/2025.emnlp-industry.37/
DOI:
Bibkey:
Cite (ACL):: Xinyi Gu and Jiayuan Mao. 2025. Generating Fine Details of Entity Interactions. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 540–563, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: Generating Fine Details of Entity Interactions (Gu & Mao, EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-industry.37.pdf

PDF Cite Search Fix data