Explicit Tone Transcription Improves ASR Performance in Extremely Low-Resource Languages: A Case Study in Bribri

Rolando Coto-Solano


Abstract
Linguistic tone is transcribed for input into ASR systems in numerous ways. This paper shows a systematic test of several transcription styles, using as an example the Chibchan language Bribri, an extremely low-resource language from Costa Rica. The most successful models separate the tone from the vowel, so that the ASR algorithms learn tone patterns independently. These models showed improvements ranging from 4% to 25% in character error rate (CER), and between 3% and 23% in word error rate (WER). This is true for both traditional GMM/HMM and end-to-end CTC algorithms. This paper also presents the first attempt to train ASR models for Bribri. The best performing models had a CER of 33% and a WER of 50%. Despite the disadvantage of using hand-engineered representations, these models were trained on only 68 minutes of data, and therefore show the potential of ASR to generate further training materials and aid in the documentation and revitalization of the language.
Anthology ID:
2021.americasnlp-1.20
Volume:
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
Month:
June
Year:
2021
Address:
Online
Editors:
Manuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer, Graham Neubig, Katharina Kann
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
173–184
Language:
URL:
https://aclanthology.org/2021.americasnlp-1.20
DOI:
10.18653/v1/2021.americasnlp-1.20
Bibkey:
Cite (ACL):
Rolando Coto-Solano. 2021. Explicit Tone Transcription Improves ASR Performance in Extremely Low-Resource Languages: A Case Study in Bribri. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 173–184, Online. Association for Computational Linguistics.
Cite (Informal):
Explicit Tone Transcription Improves ASR Performance in Extremely Low-Resource Languages: A Case Study in Bribri (Coto-Solano, AmericasNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.americasnlp-1.20.pdf