RONA: Pragmatically Diverse Image Captioning with Coherence Relations

Aashish Anantha Ramakrishnan; Aadarsh Anantha Ramakrishnan; Dongwon Lee

doi:10.18653/v1/2025.in2writing-1.8

RONA: Pragmatically Diverse Image Captioning with Coherence Relations

Aashish Anantha Ramakrishnan, Aadarsh Anantha Ramakrishnan, Dongwon Lee

Abstract

Writing Assistants (e.g., Grammarly, Microsoft Copilot) traditionally generate diverse image captions by employing syntactic and semantic variations to describe image components. However, human-written captions prioritize conveying a central message alongside visual descriptions using pragmatic cues. To enhance caption diversity, it is essential to explore alternative ways of communicating these messages in conjunction with visual content. We propose RONA, a novel prompting strategy for Multi-modal Large Language Models (MLLM) that leverages Coherence Relations as a controllable axis for pragmatic variations. We demonstrate that RONA generates captions with better overall diversity and ground-truth alignment, compared to MLLM baselines across multiple domains. Our code is available at: https://github.com/aashish2000/RONA

Anthology ID:: 2025.in2writing-1.8
Volume:: Proceedings of the Fourth Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2025)
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico, US
Editors:: Vishakh Padmakumar, Katy Gero, Thiemo Wambsganss, Sarah Sterman, Ting-Hao Huang, David Zhou, John Chung
Venues:: In2Writing | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 74–86
Language:
URL:: https://aclanthology.org/2025.in2writing-1.8/
DOI:: 10.18653/v1/2025.in2writing-1.8
Bibkey:
Cite (ACL):: Aashish Anantha Ramakrishnan, Aadarsh Anantha Ramakrishnan, and Dongwon Lee. 2025. RONA: Pragmatically Diverse Image Captioning with Coherence Relations. In Proceedings of the Fourth Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2025), pages 74–86, Albuquerque, New Mexico, US. Association for Computational Linguistics.
Cite (Informal):: RONA: Pragmatically Diverse Image Captioning with Coherence Relations (Anantha Ramakrishnan et al., In2Writing 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.in2writing-1.8.pdf

PDF Cite Search Fix data