QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Mohammad Aflah Khan, Neemesh Yadav, Sarah Masud, Md. Shad Akhtar


Abstract
The rise of large language models (LLMs) has created a need for advanced benchmarking systems beyond traditional setups. To this end, we introduce QUENCH, a novel text-based English Quizzing Benchmark manually curated and transcribed from YouTube quiz videos. QUENCH possesses masked entities and rationales for the LLMs to predict via generation. At the intersection of world knowledge, geographical context, and common sense reasoning, QUENCH helps assess world knowledge and deduction capabilities of LLMs via a zero-shot, open-domain quizzing setup. We perform an extensive evaluation on 7 LLMs and 4 metrics, investigating the influence of model size, prompting style, geographical context, and gold-labeled rationale generation. The benchmarking concludes with an error analysis of various types of generative errors to which the LLMs are prone.
Anthology ID:
2025.coling-main.303
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4493–4509
Language:
URL:
https://aclanthology.org/2025.coling-main.303/
DOI:
Bibkey:
Cite (ACL):
Mohammad Aflah Khan, Neemesh Yadav, Sarah Masud, and Md. Shad Akhtar. 2025. QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs. In Proceedings of the 31st International Conference on Computational Linguistics, pages 4493–4509, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs (Khan et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.303.pdf