Nishant Baghel
2020
ScAA: A Dataset for Automated Short Answer Grading of Children’s free-text Answers in Hindi and Marathi
Dolly Agarwal
|
Somya Gupta
|
Nishant Baghel
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Automatic short answer grading (ASAG) techniques are designed to automatically assess short answers written in natural language. Apart from MCQs, evaluating free text answer is essential to assess the knowledge and understanding of children in the subject. But assessing descriptive answers in low resource languages in a linguistically diverse country like India poses significant hurdles. To solve this assessment problem and advance NLP research in regional Indian languages, we present the Science Answer Assessment (ScAA) dataset of children’s answers in the age group of 8-14. ScAA dataset is a 2-way (correct/incorrect) labeled dataset and contains 10,988 and 1,955 pairs of natural answers along with model answers for Hindi and Marathi respectively for 32 questions. We benchmark various state-of-the-art ASAG methods, and show the data presents a strong challenge for future research.
Automated Assessment of Noisy Crowdsourced Free-text Answers for Hindi in Low Resource Setting
Dolly Agarwal
|
Somya Gupta
|
Nishant Baghel
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
The requirement of performing assessments continually on a larger scale necessitates the implementation of automated systems for evaluation of the learners’ responses to free-text questions. We target children of age group 8-14 years and use an ASR integrated assessment app to crowdsource learners’ responses to free text questions in Hindi. The app helped collect 39641 user answers to 35 different questions of Science topics. Since the users are young children from rural India and may not be well-equipped with technology, it brings in various noise types in the answers. We describe these noise types and propose a preprocessing pipeline to denoise user’s answers. We showcase the performance of different similarity metrics on the noisy and denoised versions of user and model answers. Our findings have large-scale applications for automated answer assessment for school children in India in low resource settings.