Combining Weakly Supervised ML Techniques for Low-Resource NLU

Victor Soto, Konstantine Arkoudas


Abstract
Recent advances in transfer learning have improved the performance of virtual assistants considerably. Nevertheless, creating sophisticated voice-enabled applications for new domains remains a challenge, and meager training data is often a key bottleneck. Accordingly, unsupervised learning and SSL (semi-supervised learning) techniques continue to be of vital importance. While a number of such methods have been explored previously in isolation, in this paper we investigate the synergistic use of a number of weakly supervised techniques with a view to improving NLU (Natural Language Understanding) accuracy in low-resource settings. We explore three different approaches incorporating anonymized, unlabeled and automatically transcribed user utterances into the training process, two focused on data augmentation via SSL and another one focused on unsupervised and transfer learning. We show promising results, obtaining gains that range from 4.73% to 7.65% relative improvements on semantic error rate for each individual approach. Moreover, the combination of all three methods together yields a relative improvement of 11.77% over our current baseline model. Our methods are applicable to any new domain with minimal training data, and can be deployed over time into a cycle of continual learning.
Anthology ID:
2021.naacl-industry.36
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers
Month:
June
Year:
2021
Address:
Online
Editors:
Young-bum Kim, Yunyao Li, Owen Rambow
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
288–295
Language:
URL:
https://aclanthology.org/2021.naacl-industry.36
DOI:
10.18653/v1/2021.naacl-industry.36
Bibkey:
Cite (ACL):
Victor Soto and Konstantine Arkoudas. 2021. Combining Weakly Supervised ML Techniques for Low-Resource NLU. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, pages 288–295, Online. Association for Computational Linguistics.
Cite (Informal):
Combining Weakly Supervised ML Techniques for Low-Resource NLU (Soto & Arkoudas, NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-industry.36.pdf
Video:
 https://aclanthology.org/2021.naacl-industry.36.mp4