Developing ASR for Indonesian-English Bilingual Language Teaching

Zara Maxwell-Smith, Ben Foley


Abstract
Usage-based analyses of teacher corpora and code-switching (Boztepe, 2003) are an important next stage in understanding language acquisition. Multilingual corpora are difficult to compile and a classroom setting adds pedagogy to the mix of factors which make this data so rich and problematic to classify. Using quantitative methods to understand language learning and teaching is difficult work as the ‘transcription bottleneck’ constrains the size of datasets. We found that using an automatic speech recognition (ASR) toolkit with a small set of training data is likely to speed data collection in this context (Maxwelll-Smith et al., 2020).
Anthology ID:
2021.calcs-1.17
Volume:
Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Month:
June
Year:
2021
Address:
Online
Editors:
Thamar Solorio, Shuguang Chen, Alan W. Black, Mona Diab, Sunayana Sitaram, Victor Soto, Emre Yilmaz, Anirudh Srinivasan
Venue:
CALCS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
131–132
Language:
URL:
https://aclanthology.org/2021.calcs-1.17
DOI:
10.18653/v1/2021.calcs-1.17
Bibkey:
Cite (ACL):
Zara Maxwell-Smith and Ben Foley. 2021. Developing ASR for Indonesian-English Bilingual Language Teaching. In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 131–132, Online. Association for Computational Linguistics.
Cite (Informal):
Developing ASR for Indonesian-English Bilingual Language Teaching (Maxwell-Smith & Foley, CALCS 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.calcs-1.17.pdf