Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR

Nina Markl, Stephen Joseph McNulty


Abstract
Despite the fact that variation is a fundamental characteristic of natural language, automatic speech recognition systems perform systematically worse on non-standardised and marginalised language varieties. In this paper we use the lens of language policy to analyse how current practices in training and testing ASR systems in industry lead to the data bias giving rise to these systematic error differences. We believe that this is a useful perspective for speech and language technology practitioners to understand the origins and harms of algorithmic bias, and how they can mitigate it. We also propose a re-framing of language resources as (public) infrastructure which should not solely be designed for markets, but for, and with meaningful cooperation of, speech communities.
Anthology ID:
2022.lrec-1.680
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6328–6339
Language:
URL:
https://aclanthology.org/2022.lrec-1.680
DOI:
Bibkey:
Cite (ACL):
Nina Markl and Stephen Joseph McNulty. 2022. Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6328–6339, Marseille, France. European Language Resources Association.
Cite (Informal):
Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR (Markl & McNulty, LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.680.pdf