Bottlenecks of In-Context Learning for Fieldwork ASR: A Case-study of Panãra

Siyu Liang; Myriam Lapierre; Gina-Anne Levow

Bottlenecks of In-Context Learning for Fieldwork ASR: A Case-study of Panãra

Siyu Liang, Myriam Lapierre, Gina-Anne Levow

Abstract

In-context learning (ICL) enables ASR models to transcribe unseen languages by conditioning on a handful of audio-transcript pairs at inference time, with no fine-tuning. This is appealing for language documentation, where transcribed data is scarce and recording conditions vary across sessions. We evaluate ICL on Panãra (Northern Jê, Brazil), a language with a complex practical orthography in which diacritics encode phonemic contrasts, across seven fieldwork recordings varying in speaker, narrative, and recording context. We find substantial within-language variation in transcription accuracy unexplained by any single recording-level factor, and show that diacritics are a systematic bottleneck with pronounced differences across diacritic types. An orthographic manipulation experiment further shows that how diacritics are represented in context transcriptions substantially affects model performance. These results highlight orthographic complexity and recording-level variation as key practical challenges for ICL-assisted fieldwork transcription.

Anthology ID:: 2026.computel-1.17
Volume:: Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Godfred Agyapong, Sarah Moeller, Antti Arppe, Ali Marashian, Daisy Rosenblum
Venues:: ComputEL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 157–166
Language:
URL:: https://aclanthology.org/2026.computel-1.17/
DOI:
Bibkey:
Cite (ACL):: Siyu Liang, Myriam Lapierre, and Gina-Anne Levow. 2026. Bottlenecks of In-Context Learning for Fieldwork ASR: A Case-study of Panãra. In Proceedings of the Ninth Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-9), pages 157–166, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Bottlenecks of In-Context Learning for Fieldwork ASR: A Case-study of Panãra (Liang et al., ComputEL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.computel-1.17.pdf
Supplementarymaterial:: 2026.computel-1.17.SupplementaryMaterial.txt
Supplementarymaterial:: 2026.computel-1.17.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Supplementarymaterial Fix data