Ryan Spring
2020
Training Question Answering Models From Synthetic Data
Raul Puri
|
Ryan Spring
|
Mohammad Shoeybi
|
Mostofa Patwary
|
Bryan Catanzaro
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Question and answer generation is a data augmentation method that aims to improve question answering (QA) models given the limited amount of human labeled data. However, a considerable gap remains between synthetic and human-generated question-answer pairs. This work aims to narrow this gap by taking advantage of large language models and explores several factors such as model size, quality of pretrained models, scale of data synthesized, and algorithmic choices. On the SQuAD1.1 question answering task, we achieve higher accuracy using solely synthetic questions and answers than when using the SQuAD1.1 training set questions alone. Removing access to real Wikipedia data, we synthesize questions and answers from a synthetic text corpus generated by an 8.3 billion parameter GPT-2 model and achieve 88.4 Exact Match (EM) and 93.9 F1 score on the SQuAD1.1 dev set. We further apply our methodology to SQuAD2.0 and show a 2.8 absolute gain on EM score compared to prior work using synthetic data.
2018
The Effect of L2 Onset on L2 and L3 learning: The Case of Native Speakers of Burkinabe languages
Alain Hien
|
Ryan Spring
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
2010
A Look into the Acquisition of English Motion Event Conflation by Native Speakers of Chinese and Japanese
Ryan Spring
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation