Scaling ASR for Hutsul Dialect: Multi-Speaker Data Collection, Enhanced Transcription and Cross-Speaker Evaluation

Artem Orlovskyi; Zakhar Guzii; Bohdan Onyshchenko; Roman Kyslyi; Pavlo Khomenko

Scaling ASR for Hutsul Dialect: Multi-Speaker Data Collection, Enhanced Transcription and Cross-Speaker Evaluation

Artem Orlovskyi, Zakhar Guzii, Bohdan Onyshchenko, Roman Kyslyi, Pavlo Khomenko

Abstract

We present a significant expansion of ASR resources for the Hutsul dialect of Ukrainian, building on prior work that established the first aligned speech corpus from a single literary source. In this work, we scale the dataset from a single speaker to a multi-speaker corpus comprising 40 speakers and 60.63 hours of audio drawn from diverse sources: YouTube channels (with author permissions), field recordings from native speakers, linguist student recordings, and regional radio broadcasts. To obtain reference transcriptions for audio without existing text, we introduce a novel RAG-enhanced correction pipeline: audio is first transcribed using ElevenLabs, then corrected through a RAG pipeline backed by a dialect-aware language model. We evaluate a fine-tuned ASR models across five distinct speaker datasets, demonstrating that while the model achieves strong performance on in-domain speakers (CER 3.24%), cross-speaker generalization remains challenging, with CER ranging from 5.33% to 17.24% depending on speaker characteristics. All data, code, and models are released publicly to support further research on Ukrainian dialect speech technologies.

Anthology ID:: 2026.unlp-1.16
Volume:: Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Month:: May
Year:: 2026
Address:: Lviv, Ukraine
Editor:: Mariana Romanyshyn
Venue:: UNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 184–198
Language:
URL:: https://aclanthology.org/2026.unlp-1.16/
DOI:
Bibkey:
Cite (ACL):: Artem Orlovskyi, Zakhar Guzii, Bohdan Onyshchenko, Roman Kyslyi, and Pavlo Khomenko. 2026. Scaling ASR for Hutsul Dialect: Multi-Speaker Data Collection, Enhanced Transcription and Cross-Speaker Evaluation. In Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026), pages 184–198, Lviv, Ukraine. Association for Computational Linguistics.
Cite (Informal):: Scaling ASR for Hutsul Dialect: Multi-Speaker Data Collection, Enhanced Transcription and Cross-Speaker Evaluation (Orlovskyi et al., UNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.unlp-1.16.pdf

PDF Cite Search Fix data