Handling Anomalies of Synthetic Questions in Unsupervised Question Answering

Giwon Hong, Junmo Kang, Doyeon Lim, Sung-Hyon Myaeng


Abstract
Advances in Question Answering (QA) research require additional datasets for new domains, languages, and types of questions, as well as for performance increases. Human creation of a QA dataset like SQuAD, however, is expensive. As an alternative, an unsupervised QA approach has been proposed so that QA training data can be generated automatically. However, the performance of unsupervised QA is much lower than that of supervised QA models. We identify two anomalies in the automatically generated questions and propose how they can be mitigated. We show our approach helps improve unsupervised QA significantly across a number of QA tasks.
Anthology ID:
2020.coling-main.306
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3441–3448
Language:
URL:
https://aclanthology.org/2020.coling-main.306
DOI:
10.18653/v1/2020.coling-main.306
Bibkey:
Cite (ACL):
Giwon Hong, Junmo Kang, Doyeon Lim, and Sung-Hyon Myaeng. 2020. Handling Anomalies of Synthetic Questions in Unsupervised Question Answering. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3441–3448, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Handling Anomalies of Synthetic Questions in Unsupervised Question Answering (Hong et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.306.pdf
Data
HotpotQAMRQANatural QuestionsNewsQASQuADSearchQATriviaQA