Using Paraphrasing and Memory-Augmented Models to Combat Data Sparsity in Question Interpretation with a Virtual Patient Dialogue System

Lifeng Jin, David King, Amad Hussein, Michael White, Douglas Danforth


Abstract
When interpreting questions in a virtual patient dialogue system one must inevitably tackle the challenge of a long tail of relatively infrequently asked questions. To make progress on this challenge, we investigate the use of paraphrasing for data augmentation and neural memory-based classification, finding that the two methods work best in combination. In particular, we find that the neural memory-based approach not only outperforms a straight CNN classifier on low frequency questions, but also takes better advantage of the augmented data created by paraphrasing, together yielding a nearly 10% absolute improvement in accuracy on the least frequently asked questions.
Anthology ID:
W18-0502
Volume:
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Joel Tetreault, Jill Burstein, Ekaterina Kochmar, Claudia Leacock, Helen Yannakoudakis
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
13–23
Language:
URL:
https://aclanthology.org/W18-0502
DOI:
10.18653/v1/W18-0502
Bibkey:
Cite (ACL):
Lifeng Jin, David King, Amad Hussein, Michael White, and Douglas Danforth. 2018. Using Paraphrasing and Memory-Augmented Models to Combat Data Sparsity in Question Interpretation with a Virtual Patient Dialogue System. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 13–23, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Using Paraphrasing and Memory-Augmented Models to Combat Data Sparsity in Question Interpretation with a Virtual Patient Dialogue System (Jin et al., BEA 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-0502.pdf