2023
pdf
bib
abs
Why Aren’t We NER Yet? Artifacts of ASR Errors in Named Entity Recognition in Spontaneous Speech Transcripts
Piotr Szymański
|
Lukasz Augustyniak
|
Mikolaj Morzy
|
Adrian Szymczak
|
Krzysztof Surdyk
|
Piotr Żelasko
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Transcripts of spontaneous human speech present a significant obstacle for traditional NER models. The lack of grammatical structure of spoken utterances and word errors introduced by the ASR make downstream NLP tasks challenging. In this paper, we examine in detail the complex relationship between ASR and NER errors which limit the ability of NER models to recover entity mentions from spontaneous speech transcripts. Using publicly available benchmark datasets (SWNE, Earnings-21, OntoNotes), we present the full taxonomy of ASR-NER errors and measure their true impact on entity recognition. We find that NER models fail spectacularly even if no word errors are introduced by the ASR. We also show why the F1 score is inadequate to evaluate NER models on conversational transcripts.
2021
pdf
bib
abs
What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition
Piotr Żelasko
|
Raghavendra Pappagari
|
Najim Dehak
Transactions of the Association for Computational Linguistics, Volume 9
Dialog acts can be interpreted as the atomic units of a conversation, more fine-grained than utterances, characterized by a specific communicative function. The ability to structure a conversational transcript as a sequence of dialog acts—dialog act recognition, including the segmentation—is critical for understanding dialog. We apply two pre-trained transformer models, XLNet and Longformer, to this task in English and achieve strong results on Switchboard Dialog Act and Meeting Recorder Dialog Act corpora with dialog act segmentation error rates (DSER) of 8.4% and 14.2%. To understand the key factors affecting dialog act recognition, we perform a comparative analysis of models trained under different conditions. We find that the inclusion of a broader conversational context helps disambiguate many dialog act classes, especially those infrequent in the training data. The presence of punctuation in the transcripts has a massive effect on the models’ performance, and a detailed analysis reveals specific segmentation patterns observed in its absence. Finally, we find that the label set specificity does not affect dialog act segmentation performance. These findings have significant practical implications for spoken language understanding applications that depend heavily on a good-quality segmentation being available.
2020
pdf
bib
abs
WER we are and WER we think we are
Piotr Szymański
|
Piotr Żelasko
|
Mikolaj Morzy
|
Adrian Szymczak
|
Marzena Żyła-Hoppe
|
Joanna Banaszczak
|
Lukasz Augustyniak
|
Jan Mizgajski
|
Yishay Carmiel
Findings of the Association for Computational Linguistics: EMNLP 2020
Natural language processing of conversational speech requires the availability of high-quality transcripts. In this paper, we express our skepticism towards the recent reports of very low Word Error Rates (WERs) achieved by modern Automatic Speech Recognition (ASR) systems on benchmark datasets. We outline several problems with popular benchmarks and compare three state-of-the-art commercial ASR systems on an internal dataset of real-life spontaneous human conversations and HUB’05 public benchmark. We show that WERs are significantly higher than the best reported results. We formulate a set of guidelines which may aid in the creation of real-life, multi-domain datasets with high quality annotations for training and testing of robust ASR systems.
2018
pdf
bib
An Application for Building a Polish Telephone Speech Corpus
Bartosz Ziółko
|
Piotr Żelasko
|
Ireneusz Gawlik
|
Tomasz Pędzimąż
|
Tomasz Jadczyk
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
bib
Expanding Abbreviations in a Strongly Inflected Language: Are Morphosyntactic Tags Sufficient?
Piotr Żelasko
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)