MIRAGE: Metadata-guided Image Retrieval and Answer Generation for E-commerce Troubleshooting

Rishav Sahay; Lavanya Sita Tekumalla; Anoop Saladi

MIRAGE: Metadata-guided Image Retrieval and Answer Generation for E-commerce Troubleshooting

Rishav Sahay, Lavanya Sita Tekumalla, Anoop Saladi

Abstract

Existing multimodal systems typically associate text and available images based on embedding similarity or simple co-location, but such approaches often fail to ensure that the linked image accurately depicts the specific product or component mentioned in a troubleshooting instruction. We introduce MIRAGE, a metadata-first paradigm that treats structured metadata, (not raw pixels), as a first-class modality for multimodal grounding. In MIRAGE, both text and images are projected through a shared semantic schema capturing product attributes, context, and visual aspects, enabling reasoning over interpretable attributes for troubleshooting rather than unstructured embeddings. MIRAGE comprises of three complementary modules: M-Link for schema-guided image–text linking, M-Gen for metadata-conditioned multimodal generation, and M-Eval for consistency evaluation in the same structured space. Experiments on large-scale enterprise e-commerce troubleshooting data across 10 product types on 100K text chunks and 35K images show that metadata-centric grounding achieves over 40% higher linking coverage of high-quality visual content and over 45% in linking and response quality than embedding-based baselines. MIRAGE demonstrates the potential of structured metadata in enabling scalable, fine-grained grounding in multimodal troubleshooting systems.

Anthology ID:: 2026.eacl-industry.56
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Yevgen Matusevych, Gülşen Eryiğit, Nikolaos Aletras
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 764–776
Language:
URL:: https://aclanthology.org/2026.eacl-industry.56/
DOI:
Bibkey:
Cite (ACL):: Rishav Sahay, Lavanya Sita Tekumalla, and Anoop Saladi. 2026. MIRAGE: Metadata-guided Image Retrieval and Answer Generation for E-commerce Troubleshooting. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), pages 764–776, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: MIRAGE: Metadata-guided Image Retrieval and Answer Generation for E-commerce Troubleshooting (Sahay et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-industry.56.pdf

PDF Cite Search Fix data