Paolo Cremonesi
2021
Towards the Application of Calibrated Transformers to the Unsupervised Estimation of Question Difficulty from Text
Ekaterina Loginova
|
Luca Benedetto
|
Dries Benoit
|
Paolo Cremonesi
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Being able to accurately perform Question Difficulty Estimation (QDE) can improve the accuracy of students’ assessment and better their learning experience. Traditional approaches to QDE are either subjective or introduce a long delay before new questions can be used to assess students. Thus, recent work proposed machine learning-based approaches to overcome these limitations. They use questions of known difficulty to train models capable of inferring the difficulty of questions from their text. Once trained, they can be used to perform QDE of newly created questions. Existing approaches employ supervised models which are domain-dependent and require a large dataset of questions of known difficulty for training. Therefore, they cannot be used if such a dataset is not available ( for new courses on an e-learning platform). In this work, we experiment with the possibility of performing QDE from text in an unsupervised manner. Specifically, we use the uncertainty of calibrated question answering models as a proxy of human-perceived difficulty. Our experiments show promising results, suggesting that model uncertainty could be successfully leveraged to perform QDE from text, reducing both costs and elapsed time.
On the application of Transformers for estimating the difficulty of Multiple-Choice Questions from text
Luca Benedetto
|
Giovanni Aradelli
|
Paolo Cremonesi
|
Andrea Cappelli
|
Andrea Giussani
|
Roberto Turrin
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications
Classical approaches to question calibration are either subjective or require newly created questions to be deployed before being calibrated. Recent works explored the possibility of estimating question difficulty from text, but did not experiment with the most recent NLP models, in particular Transformers. In this paper, we compare the performance of previous literature with Transformer models experimenting on a public and a private dataset. Our experimental results show that Transformers are capable of outperforming previously proposed models. Moreover, if an additional corpus of related documents is available, Transformers can leverage that information to further improve calibration accuracy. We characterize the dependence of the model performance on some properties of the questions, showing that it performs best on questions ending with a question mark and Multiple-Choice Questions (MCQs) with one correct choice.
Search
Co-authors
- Luca Benedetto 2
- Ekaterina Loginova 1
- Dries Benoit 1
- Giovanni Aradelli 1
- Andrea Cappelli 1
- show all...