@inproceedings{pasad-etal-2022-use,
title = "On the Use of External Data for Spoken Named Entity Recognition",
author = "Pasad, Ankita and
Wu, Felix and
Shon, Suwon and
Livescu, Karen and
Han, Kyu",
editor = "Carpuat, Marine and
de Marneffe, Marie-Catherine and
Meza Ruiz, Ivan Vladimir",
booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
month = jul,
year = "2022",
address = "Seattle, United States",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.naacl-main.53",
doi = "10.18653/v1/2022.naacl-main.53",
pages = "724--737",
abstract = "Spoken language understanding (SLU) tasks involve mapping from speech signals to semantic labels. Given the complexity of such tasks, good performance is expected to require large labeled datasets, which are difficult to collect for each new task and domain. However, recent advances in self-supervised speech representations have made it feasible to consider learning SLU models with limited labeled data. In this work, we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task? We consider self-training, knowledge distillation, and transfer learning for end-to-end (E2E) and pipeline (speech recognition followed by text NER) approaches. We find that several of these approaches improve performance in resource-constrained settings beyond the benefits from pre-trained representations. Compared to prior work, we find relative improvements in F1 of up to 16{\%}. While the best baseline model is a pipeline approach, the best performance using external data is ultimately achieved by an E2E model. We provide detailed comparisons and analyses, developing insights on, for example, the effects of leveraging external data on (i) different categories of NER errors and (ii) the switch in performance trends between pipeline and E2E models.",
}
<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="pasad-etal-2022-use">
<titleInfo>
<title>On the Use of External Data for Spoken Named Entity Recognition</title>
</titleInfo>
<name type="personal">
<namePart type="given">Ankita</namePart>
<namePart type="family">Pasad</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Felix</namePart>
<namePart type="family">Wu</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Suwon</namePart>
<namePart type="family">Shon</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Karen</namePart>
<namePart type="family">Livescu</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Kyu</namePart>
<namePart type="family">Han</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2022-07</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
</titleInfo>
<name type="personal">
<namePart type="given">Marine</namePart>
<namePart type="family">Carpuat</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Marie-Catherine</namePart>
<namePart type="family">de Marneffe</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ivan</namePart>
<namePart type="given">Vladimir</namePart>
<namePart type="family">Meza Ruiz</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>Association for Computational Linguistics</publisher>
<place>
<placeTerm type="text">Seattle, United States</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
</relatedItem>
<abstract>Spoken language understanding (SLU) tasks involve mapping from speech signals to semantic labels. Given the complexity of such tasks, good performance is expected to require large labeled datasets, which are difficult to collect for each new task and domain. However, recent advances in self-supervised speech representations have made it feasible to consider learning SLU models with limited labeled data. In this work, we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task? We consider self-training, knowledge distillation, and transfer learning for end-to-end (E2E) and pipeline (speech recognition followed by text NER) approaches. We find that several of these approaches improve performance in resource-constrained settings beyond the benefits from pre-trained representations. Compared to prior work, we find relative improvements in F1 of up to 16%. While the best baseline model is a pipeline approach, the best performance using external data is ultimately achieved by an E2E model. We provide detailed comparisons and analyses, developing insights on, for example, the effects of leveraging external data on (i) different categories of NER errors and (ii) the switch in performance trends between pipeline and E2E models.</abstract>
<identifier type="citekey">pasad-etal-2022-use</identifier>
<identifier type="doi">10.18653/v1/2022.naacl-main.53</identifier>
<location>
<url>https://aclanthology.org/2022.naacl-main.53</url>
</location>
<part>
<date>2022-07</date>
<extent unit="page">
<start>724</start>
<end>737</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T On the Use of External Data for Spoken Named Entity Recognition
%A Pasad, Ankita
%A Wu, Felix
%A Shon, Suwon
%A Livescu, Karen
%A Han, Kyu
%Y Carpuat, Marine
%Y de Marneffe, Marie-Catherine
%Y Meza Ruiz, Ivan Vladimir
%S Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
%D 2022
%8 July
%I Association for Computational Linguistics
%C Seattle, United States
%F pasad-etal-2022-use
%X Spoken language understanding (SLU) tasks involve mapping from speech signals to semantic labels. Given the complexity of such tasks, good performance is expected to require large labeled datasets, which are difficult to collect for each new task and domain. However, recent advances in self-supervised speech representations have made it feasible to consider learning SLU models with limited labeled data. In this work, we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task? We consider self-training, knowledge distillation, and transfer learning for end-to-end (E2E) and pipeline (speech recognition followed by text NER) approaches. We find that several of these approaches improve performance in resource-constrained settings beyond the benefits from pre-trained representations. Compared to prior work, we find relative improvements in F1 of up to 16%. While the best baseline model is a pipeline approach, the best performance using external data is ultimately achieved by an E2E model. We provide detailed comparisons and analyses, developing insights on, for example, the effects of leveraging external data on (i) different categories of NER errors and (ii) the switch in performance trends between pipeline and E2E models.
%R 10.18653/v1/2022.naacl-main.53
%U https://aclanthology.org/2022.naacl-main.53
%U https://doi.org/10.18653/v1/2022.naacl-main.53
%P 724-737
Markdown (Informal)
[On the Use of External Data for Spoken Named Entity Recognition](https://aclanthology.org/2022.naacl-main.53) (Pasad et al., NAACL 2022)
ACL
- Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, and Kyu Han. 2022. On the Use of External Data for Spoken Named Entity Recognition. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 724–737, Seattle, United States. Association for Computational Linguistics.