On the Use of External Data for Spoken Named Entity Recognition

Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu Han


Abstract
Spoken language understanding (SLU) tasks involve mapping from speech signals to semantic labels. Given the complexity of such tasks, good performance is expected to require large labeled datasets, which are difficult to collect for each new task and domain. However, recent advances in self-supervised speech representations have made it feasible to consider learning SLU models with limited labeled data. In this work, we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task? We consider self-training, knowledge distillation, and transfer learning for end-to-end (E2E) and pipeline (speech recognition followed by text NER) approaches. We find that several of these approaches improve performance in resource-constrained settings beyond the benefits from pre-trained representations. Compared to prior work, we find relative improvements in F1 of up to 16%. While the best baseline model is a pipeline approach, the best performance using external data is ultimately achieved by an E2E model. We provide detailed comparisons and analyses, developing insights on, for example, the effects of leveraging external data on (i) different categories of NER errors and (ii) the switch in performance trends between pipeline and E2E models.
Anthology ID:
2022.naacl-main.53
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
724–737
Language:
URL:
https://aclanthology.org/2022.naacl-main.53
DOI:
10.18653/v1/2022.naacl-main.53
Bibkey:
Cite (ACL):
Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, and Kyu Han. 2022. On the Use of External Data for Spoken Named Entity Recognition. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 724–737, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
On the Use of External Data for Spoken Named Entity Recognition (Pasad et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.53.pdf
Video:
 https://aclanthology.org/2022.naacl-main.53.mp4
Code
 asappresearch/spoken-ner
Data
OntoNotes 5.0SLUE