PPDAC: A Plug-and -Play Data Augmentation Component for Few-shot Extractive Question Answering

Huang Qi, Fu Han, Luo Wenbin, Wang Mingwen, Luo Kaiwei


Abstract
“Extractive Question Answering (EQA) in the few-shot learning scenario is one of the most chal-lenging tasks of Machine Reading Comprehension (MRC). Some previous works employ exter-nal knowledge for data augmentation to improve the performance of few-shot extractive ques-tion answering. However, there are not always available external knowledge or language- anddomain-specific NLP tools to deal with external knowledge such as part-of-speech taggers, syn-tactic parsers, and named-entity recognizers. In this paper, we present a novel Plug-and-PlayData Augmentation Component (PPDAC) for the few-shot extractive question answering, whichincludes a paraphrase generator and a paraphrase selector. Specifically, we generate multipleparaphrases of the question in the (question, passage, answer) triples using the paraphrase gener-ator and then obtain highly similar statements via paraphrase selector to form more training datafor fine-tuning. Extensive experiments on multiple EQA datasets show that our proposed plug-and-play data augmentation component significantly improves question-answering performance,and consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.”
Anthology ID:
2024.ccl-1.102
Volume:
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:
July
Year:
2024
Address:
Taiyuan, China
Editors:
Maosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
1320–1333
Language:
English
URL:
https://aclanthology.org/2024.ccl-1.102/
DOI:
Bibkey:
Cite (ACL):
Huang Qi, Fu Han, Luo Wenbin, Wang Mingwen, and Luo Kaiwei. 2024. PPDAC: A Plug-and -Play Data Augmentation Component for Few-shot Extractive Question Answering. In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 1320–1333, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):
PPDAC: A Plug-and -Play Data Augmentation Component for Few-shot Extractive Question Answering (Qi et al., CCL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ccl-1.102.pdf