Unsupervised multiple-choice question generation for out-of-domain Q&A fine-tuning

Guillaume Le Berre, Christophe Cerisara, Philippe Langlais, Guy Lapalme


Abstract
Pre-trained models have shown very good performances on a number of question answering benchmarks especially when fine-tuned on multiple question answering datasets at once. In this work, we propose an approach for generating a fine-tuning dataset thanks to a rule-based algorithm that generates questions and answers from unannotated sentences. We show that the state-of-the-art model UnifiedQA can greatly benefit from such a system on a multiple-choice benchmark about physics, biology and chemistry it has never been trained on. We further show that improved performances may be obtained by selecting the most challenging distractors (wrong answers), with a dedicated ranker based on a pretrained RoBERTa model.
Anthology ID:
2022.acl-short.83
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
732–738
Language:
URL:
https://aclanthology.org/2022.acl-short.83
DOI:
10.18653/v1/2022.acl-short.83
Bibkey:
Cite (ACL):
Guillaume Le Berre, Christophe Cerisara, Philippe Langlais, and Guy Lapalme. 2022. Unsupervised multiple-choice question generation for out-of-domain Q&A fine-tuning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 732–738, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Unsupervised multiple-choice question generation for out-of-domain Q&A fine-tuning (Le Berre et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-short.83.pdf
Software:
 2022.acl-short.83.software.zip
Data
CommonsenseQAQASCSQuADSciQ