Bridging Underspecified Queries and Multimodal Retrieval: A Two-Stage Query Rewriting Approach

Szu-Ting Liu, Wen-Yu Cho, Hsin-Wei Wang, Berlin Chen


Abstract
Retrieval-Augmented Generation (RAG) has proven effective for text-only question answering, yet expanding it to visually rich documents remains a challenge. Existing multimodal benchmarks, often derived from visual question answering (VQA) datasets, or large vision-language model (LVLM)-generated query-image pairs, which often contain underspecified questions that assume direct image access. To mitigate this issue, we propose a two-stage query rewriting framework that first generates OCR-based image descriptions and then reformulates queries into precise, retrieval-friendly forms under explicit constraints. Experiments show consistent improvements across dense, hybrid and multimodal retrieval paradigms, with the most pronounced gains in visual document retrieval – Hits@1 rises from 21.0% to 56.6% with VDocRetriever and further to 79.3% when OCR-based descriptions are incorporated. These results indicate that query rewriting, particularly when combined with multimodal fusion, provides a reliable and scalable solution to bridge underspecified queries and improve retrieval over visually rich documents.
Anthology ID:
2025.rocling-main.7
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
63–70
Language:
URL:
https://aclanthology.org/2025.rocling-main.7/
DOI:
Bibkey:
Cite (ACL):
Szu-Ting Liu, Wen-Yu Cho, Hsin-Wei Wang, and Berlin Chen. 2025. Bridging Underspecified Queries and Multimodal Retrieval: A Two-Stage Query Rewriting Approach. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 63–70, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Bridging Underspecified Queries and Multimodal Retrieval: A Two-Stage Query Rewriting Approach (Liu et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.7.pdf