Comparing efficacy of IPA vs Pinyin romanisation transcriptions for complex tonal languages: A case study in Baima

Katia Chirkova; Rolando Coto-Solano; Rachael Griffiths; Marieke Meelen

Comparing efficacy of IPA vs Pinyin romanisation transcriptions for complex tonal languages: A case study in Baima

Katia Chirkova, Rolando Coto-Solano, Rachael Griffiths, Marieke Meelen

Abstract

How is automated tone transcription affected by the choice of transcription orthography? In this paper we present a range of experiments that indicate that, even when the tonal repre- sentations are kept the same, the way vowels and consonants are transcribed can affect tonal character outputs. Our results also indicate that using a Language Model (LM) for decoding can mitigate problems with tonal outputs, but tones remain the most difficult part of the tran- scription. In doing this we also present the first Automatic Speech Recognition (ASR) models for the Baima language, spoken in Sichuan and Gansu, China. We hope to use these models to contribute to ongoing documentation efforts.

Anthology ID:: 2025.computel-main.20
Volume:: Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages
Month:: March
Year:: 2025
Address:: Honolulu, Hawaii, USA
Editors:: Jordan Lachler, Godfred Agyapong, Antti Arppe, Sarah Moeller, Aditi Chaudhary, Shruti Rijhwani, Daisy Rosenblum
Venues:: ComputEL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 170–181
Language:
URL:: https://aclanthology.org/2025.computel-main.20/
DOI:
Bibkey:
Cite (ACL):: Katia Chirkova, Rolando Coto-Solano, Rachael Griffiths, and Marieke Meelen. 2025. Comparing efficacy of IPA vs Pinyin romanisation transcriptions for complex tonal languages: A case study in Baima. In Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 170–181, Honolulu, Hawaii, USA. Association for Computational Linguistics.
Cite (Informal):: Comparing efficacy of IPA vs Pinyin romanisation transcriptions for complex tonal languages: A case study in Baima (Chirkova et al., ComputEL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.computel-main.20.pdf

PDF Cite Search Fix data