Active Learning for New Domains in Natural Language Understanding

Stanislav Peshterliev, John Kearney, Abhyuday Jagannatha, Imre Kiss, Spyros Matsoukas


Abstract
We explore active learning (AL) for improving the accuracy of new domains in a natural language understanding (NLU) system. We propose an algorithm called Majority-CRF that uses an ensemble of classification models to guide the selection of relevant utterances, as well as a sequence labeling model to help prioritize informative examples. Experiments with three domains show that Majority-CRF achieves 6.6%-9% relative error rate reduction compared to random sampling with the same annotation budget, and statistically significant improvements compared to other AL approaches. Additionally, case studies with human-in-the-loop AL on six new domains show 4.6%-9% improvement on an existing NLU system.
Anthology ID:
N19-2012
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Anastassia Loukina, Michelle Morales, Rohit Kumar
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
90–96
Language:
URL:
https://aclanthology.org/N19-2012
DOI:
10.18653/v1/N19-2012
Bibkey:
Cite (ACL):
Stanislav Peshterliev, John Kearney, Abhyuday Jagannatha, Imre Kiss, and Spyros Matsoukas. 2019. Active Learning for New Domains in Natural Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers), pages 90–96, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Active Learning for New Domains in Natural Language Understanding (Peshterliev et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-2012.pdf