Rajat Verma
2025
TEEMIL : Towards Educational MCQ Difficulty Estimation in Indic Languages
Manikandan Ravikiran
|
Siddharth Vohra
|
Rajat Verma
|
Rohit Saluja
|
Arnav Bhavsar
Proceedings of the 31st International Conference on Computational Linguistics
Difficulty estimation of multiple-choice questions (MCQs) is crucial for creating effective educational assessments, yet remains underexplored in Indic languages like Hindi and Kannada due to the lack of comprehensive datasets. This paper addresses this gap by introducing two datasets, TEEMIL-H and TEEMIL-K, containing 4689 and 4215 MCQs, respectively, with manually annotated difficulty labels. We benchmark these datasets using state-of-the-art multilingual models and conduct ablation studies to analyze the effect of context, the impact of options, and the presence of the None of the Above (NOTA) option on difficulty estimation. Our findings establish baselines for difficulty estimation in Hindi and Kannada, offering valuable insights into improving model performance and guiding future research in MCQ difficulty estimation .