Conversational Speech Recognition Needs Data? Experiments with Austrian German

Julian Linke, Philip N. Garner, Gernot Kubin, Barbara Schuppler


Abstract
Conversational speech represents one of the most complex of automatic speech recognition (ASR) tasks owing to the high inter-speaker variation in both pronunciation and conversational dynamics. Such complexity is particularly sensitive to low-resourced (LR) scenarios. Recent developments in self-supervision have allowed such scenarios to take advantage of large amounts of otherwise unrelated data. In this study, we characterise an (LR) Austrian German conversational task. We begin with a non-pre-trained baseline and show that fine-tuning of a model pre-trained using self-supervision leads to improvements consistent with those in the literature; this extends to cases where a lexicon and language model are included. We also show that the advantage of pre-training indeed arises from the larger database rather than the self-supervision. Further, by use of a leave-one-conversation out technique, we demonstrate that robustness problems remain with respect to inter-speaker and inter-conversation variation. This serves to guide where future research might best be focused in light of the current state-of-the-art.
Anthology ID:
2022.lrec-1.500
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4684–4691
Language:
URL:
https://aclanthology.org/2022.lrec-1.500
DOI:
Bibkey:
Cite (ACL):
Julian Linke, Philip N. Garner, Gernot Kubin, and Barbara Schuppler. 2022. Conversational Speech Recognition Needs Data? Experiments with Austrian German. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4684–4691, Marseille, France. European Language Resources Association.
Cite (Informal):
Conversational Speech Recognition Needs Data? Experiments with Austrian German (Linke et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.500.pdf