Automated Tone Transcription and Clustering with Tone2Vec

Yi Yang, Yiming Wang, ZhiQiang Tang, Jiahong Yuan


Abstract
Lexical tones play a crucial role in Sino-Tibetan languages. However, current phonetic fieldwork relies on manual effort, resulting in substantial time and financial costs. This is especially challenging for the numerous endangered languages that are rapidly disappearing, often compounded by limited funding. In this paper, we introduce pitch-based similarity representations for tone transcription, named Tone2Vec. Experiments on dialect clustering and variance show that Tone2Vec effectively captures fine-grained tone variation. Utilizing Tone2Vec, we develop the first automatic approach for tone transcription and clustering by presenting a novel representation transformation for transcriptions. Additionally, these algorithms are systematically integrated into an open-sourced and easy-to-use package, ToneLab, which facilitates automated fieldwork and cross-regional, cross-lexical analysis for tonal languages. Extensive experiments were conducted to demonstrate the effectiveness of our methods.
Anthology ID:
2024.findings-emnlp.112
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2054–2065
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.112
DOI:
Bibkey:
Cite (ACL):
Yi Yang, Yiming Wang, ZhiQiang Tang, and Jiahong Yuan. 2024. Automated Tone Transcription and Clustering with Tone2Vec. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 2054–2065, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Automated Tone Transcription and Clustering with Tone2Vec (Yang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.112.pdf