Justin Spence
2023
Studying the impact of language model size for low-resource ASR
Zoey Liu
|
Justin Spence
|
Emily Prud’hommeaux
Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages
Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation
Zoey Liu
|
Justin Spence
|
Emily Prud’hommeaux
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Many automatic speech recognition (ASR) data sets include a single pre-defined test set consisting of one or more speakers whose speech never appears in the training set. This “hold-speaker(s)-out” data partitioning strategy, however, may not be ideal for data sets in which the number of speakers is very small. This study investigates ten different data split methods for five languages with minimal ASR training resources. We find that (1) model performance varies greatly depending on which speaker is selected for testing; (2) the average word error rate (WER) across all held-out speakers is comparable not only to the average WER over multiple random splits but also to any given individual random split; (3) WER is also generally comparable when the data is split heuristically or adversarially; (4) utterance duration and intensity are comparatively more predictive factors of variability regardless of the data split. These results suggest that the widely used hold-speakers-out approach to ASR data partitioning can yield results that do not reflect model performance on unseen data or speakers. Random splits can yield more reliable and generalizable estimates when facing data sparsity.
2022
Enhancing Documentation of Hupa with Automatic Speech Recognition
Zoey Liu
|
Justin Spence
|
Emily Prud’hommeaux
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
This study investigates applications of automatic speech recognition (ASR) techniques to Hupa, a critically endangered Native American language from the Dene (Athabaskan) language family. Using around 9h12m of spoken data produced by one elder who is a first-language Hupa speaker, we experimented with different evaluation schemes and training settings. On average a fully connected deep neural network reached a word error rate of 35.26%. Our overall results illustrate the utility of ASR for making Hupa language documentation more accessible and usable. In addition, we found that when training acoustic models, using recordings with transcripts that were not carefully verified did not necessarily have a negative effect on model performance. This shows promise for speech corpora of indigenous languages that commonly include transcriptions produced by second-language speakers or linguists who have advanced knowledge in the language of interest.