When Retrieval Hurts: Evidence Utilization, Script Fidelity, and Knowledge Conflicts in Multilingual RAG

Varalekshmy M Mohan; Swathi Jayakumar; Gadha Saji Menon; Sachin Kurup; Veena G; Vani Kanjirangat

When Retrieval Hurts: Evidence Utilization, Script Fidelity, and Knowledge Conflicts in Multilingual RAG

Varalekshmy M Mohan, Swathi Jayakumar, Gadha Saji Menon, Sachin Kurup, Veena G, Vani Kanjirangat

Abstract

The problem of extractive multilingual QA with LLMs is characterized by complex interactions among retrieval mechanisms, knowledge source configurations, prompting techniques, and scripting biases. Despite high retrieval quality, multilingual RAG often degrades performance, revealing a gap between retrieved evidence and its effective utilization. To address this issue, this paper offers an extensive empirical study that examines these components by comparing retrieval-augmented generation (RAG) with a non-RAG baseline across 21 typologically diverse languages and 5 leading LLMs. Our analysis includes five prompting strategies and multiple retrieval configurations, which enable a unified evaluation across diverse linguistic settings. We have also observed an evidence utilization gap in RAG settings, where RAG underperforms despite high retrieval hit rates due to models’ inefficiency in leveraging the retrieved evidence. We also introduce lightweight inference-time metrics to better characterize retrieval usage and conflict patterns.We also highlight script fidelity as a crucial factor in our experiments, as non-Latin-script languages experience significant performance drops and increased hallucinations without proper grounding. Further, we analyzed generator language preferences, systematically examined conflicts, and identified mechanisms for the effective detection and resolution of conflicts. The study further details how prompting strategies affect language families and script types, offering a detailed analysis for optimizing future multilingual RAG settings.

Anthology ID:: 2026.mellm-1.9
Volume:: Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)
Month:: July
Year:: 2026
Address:: San Diego, United States
Editors:: Kaiyu Huang, Fengran Mo, Pinzhen Chen, Meng Jiang
Venues:: MeLLM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 92–107
Language:
URL:: https://aclanthology.org/2026.mellm-1.9/
DOI:
Bibkey:
Cite (ACL):: Varalekshmy M Mohan, Swathi Jayakumar, Gadha Saji Menon, Sachin Kurup, Veena G, and Vani Kanjirangat. 2026. When Retrieval Hurts: Evidence Utilization, Script Fidelity, and Knowledge Conflicts in Multilingual RAG. In Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026), pages 92–107, San Diego, United States. Association for Computational Linguistics.
Cite (Informal):: When Retrieval Hurts: Evidence Utilization, Script Fidelity, and Knowledge Conflicts in Multilingual RAG (Mohan et al., MeLLM 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.mellm-1.9.pdf

PDF Cite Search Fix data