TEEMIL : Towards Educational MCQ Difficulty Estimation in Indic Languages

Manikandan Ravikiran, Siddharth Vohra, Rajat Verma, Rohit Saluja, Arnav Bhavsar


Abstract
Difficulty estimation of multiple-choice questions (MCQs) is crucial for creating effective educational assessments, yet remains underexplored in Indic languages like Hindi and Kannada due to the lack of comprehensive datasets. This paper addresses this gap by introducing two datasets, TEEMIL-H and TEEMIL-K, containing 4689 and 4215 MCQs, respectively, with manually annotated difficulty labels. We benchmark these datasets using state-of-the-art multilingual models and conduct ablation studies to analyze the effect of context, the impact of options, and the presence of the None of the Above (NOTA) option on difficulty estimation. Our findings establish baselines for difficulty estimation in Hindi and Kannada, offering valuable insights into improving model performance and guiding future research in MCQ difficulty estimation .
Anthology ID:
2025.coling-main.142
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2085–2099
Language:
URL:
https://aclanthology.org/2025.coling-main.142/
DOI:
Bibkey:
Cite (ACL):
Manikandan Ravikiran, Siddharth Vohra, Rajat Verma, Rohit Saluja, and Arnav Bhavsar. 2025. TEEMIL : Towards Educational MCQ Difficulty Estimation in Indic Languages. In Proceedings of the 31st International Conference on Computational Linguistics, pages 2085–2099, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
TEEMIL : Towards Educational MCQ Difficulty Estimation in Indic Languages (Ravikiran et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.142.pdf