Automating True-False Multiple-Choice Question Generation and Evaluation with Retrieval-based Accuracy Differential

Chen-Jui Yu; Wen Hung Lee; Lin Tse Ke; Shih-Wei Guo; Yao-Chung Fan

Automating True-False Multiple-Choice Question Generation and Evaluation with Retrieval-based Accuracy Differential

Chen-Jui Yu, Wen Hung Lee, Lin Tse Ke, Shih-Wei Guo, Yao-Chung Fan

Abstract

Creating high-quality True-False (TF) multiple-choice questions (MCQs), with accurate distractors, is a challenging and time-consuming task in education. This paper introduces True-False Distractor Generation (TFDG), a pipeline that leverages pre-trained language models and sentence retrieval techniques to automate the generation of TF-type MCQ distractors. Furthermore, the evaluation of generated TF questions presents a challenge. Traditional metrics like BLEU and ROUGE are unsuitable for this task. To address this, we propose a new evaluation metric called Retrieval-based Accuracy Differential (RAD). RAD assesses the discriminative power of TF questions by comparing model accuracy with and without access to reference texts. It quantitatively evaluates how well questions differentiate between students with varying knowledge levels. This research benefits educators and assessment developers, facilitating the efficient automatic generation of high-quality TF-type MCQs and their reliable evaluation.

Anthology ID:: 2024.inlg-main.16
Volume:: Proceedings of the 17th International Natural Language Generation Conference
Month:: September
Year:: 2024
Address:: Tokyo, Japan
Editors:: Saad Mahamood, Nguyen Le Minh, Daphne Ippolito
Venue:: INLG
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 198–212
Language:
URL:: https://aclanthology.org/2024.inlg-main.16
DOI:
Bibkey:
Cite (ACL):: Chen-Jui Yu, Wen Hung Lee, Lin Tse Ke, Shih-Wei Guo, and Yao-Chung Fan. 2024. Automating True-False Multiple-Choice Question Generation and Evaluation with Retrieval-based Accuracy Differential. In Proceedings of the 17th International Natural Language Generation Conference, pages 198–212, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):: Automating True-False Multiple-Choice Question Generation and Evaluation with Retrieval-based Accuracy Differential (Yu et al., INLG 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.inlg-main.16.pdf

PDF Cite Search