Local Word Discovery for Interactive Transcription

William Lane, Steven Bird


Abstract
Human expertise and the participation of speech communities are essential factors in the success of technologies for low-resource languages. Accordingly, we propose a new computational task which is tuned to the available knowledge and interests in an Indigenous community, and which supports the construction of high quality texts and lexicons. The task is illustrated for Kunwinjku, a morphologically-complex Australian language. We combine a finite state implementation of a published grammar with a partial lexicon, and apply this to a noisy phone representation of the signal. We locate known lexemes in the signal and use the morphological transducer to build these out into hypothetical, morphologically-complex words for human validation. We show that applying a single iteration of this method results in a relative transcription density gain of 17%. Further, we find that 75% of breath groups in the test set receive at least one correct partial or full-word suggestion.
Anthology ID:
2021.emnlp-main.157
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2058–2067
Language:
URL:
https://aclanthology.org/2021.emnlp-main.157
DOI:
10.18653/v1/2021.emnlp-main.157
Bibkey:
Cite (ACL):
William Lane and Steven Bird. 2021. Local Word Discovery for Interactive Transcription. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2058–2067, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Local Word Discovery for Interactive Transcription (Lane & Bird, EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.157.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.157.mp4