An (unhelpful) guide to selecting the best ASR architecture for your under-resourced language

Robert Jimerson, Zoey Liu, Emily Prud’hommeaux


Abstract
Advances in deep neural models for automatic speech recognition (ASR) have yielded dramatic improvements in ASR quality for resource-rich languages, with English ASR now achieving word error rates comparable to that of human transcribers. The vast majority of the world’s languages, however, lack the quantity of data necessary to approach this level of accuracy. In this paper we use four of the most popular ASR toolkits to train ASR models for eleven languages with limited ASR training resources: eleven widely spoken languages of Africa, Asia, and South America, one endangered language of Central America, and three critically endangered languages of North America. We find that no single architecture consistently outperforms any other. These differences in performance so far do not appear to be related to any particular feature of the datasets or characteristics of the languages. These findings have important implications for future research in ASR for under-resourced languages. ASR systems for languages with abundant existing media and available speakers may derive the most benefit simply by collecting large amounts of additional acoustic and textual training data. Communities using ASR to support endangered language documentation efforts, who cannot easily collect more data, might instead focus on exploring multiple architectures and hyperparameterizations to optimize performance within the constraints of their available data and resources.
Anthology ID:
2023.acl-short.87
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1008–1016
Language:
URL:
https://aclanthology.org/2023.acl-short.87
DOI:
10.18653/v1/2023.acl-short.87
Bibkey:
Cite (ACL):
Robert Jimerson, Zoey Liu, and Emily Prud’hommeaux. 2023. An (unhelpful) guide to selecting the best ASR architecture for your under-resourced language. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1008–1016, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
An (unhelpful) guide to selecting the best ASR architecture for your under-resourced language (Jimerson et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-short.87.pdf
Video:
 https://aclanthology.org/2023.acl-short.87.mp4