Fotheidil: an Automatic Transcription System for the Irish Language

Liam Lonergan, Ibon Saratxaga, John Sloan, Oscar Maharg Bravo, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide


Abstract
This paper sets out the first web-based transcription system for the Irish language - Fotheidil, a system that utilises speech-related AI technologies as part of the ABAIR initiative. The system includes both off-the-shelf pre-trained voice activity detection and speaker diarisation models and models trained specifically for Irish automatic speech recognition and capitalisation and punctuation restoration. Semi-supervised learning is explored to improve the acoustic model of a modular TDNN-HMM ASR system, yielding substantial improvements for out-of-domain test sets and dialects that are underrepresented in the supervised training set. A novel approach to capitalisation and punctuation restoration involving sequence-to-sequence models is compared with the conventional approach using a classification model. Experimental results show here also substantial improvements in performance. It is intended that will be made freely available for public use, and represents an important resource researchers and others who transcribe Irish language materials. Human-corrected transcriptions will be collected and included in the training dataset as the system is used, which should lead to incremental improvements to the ASR model in a cyclical, community-driven fashion.
Anthology ID:
2025.cltw-1.4
Volume:
Proceedings of the 5th Celtic Language Technology Workshop
Month:
January
Year:
2025
Address:
Abu Dhabi [Virtual Workshop]
Editors:
Brian Davis, Theodorus Fransen, Elaine Ui Dhonnchadha, Abigail Walsh
Venues:
CLTW | WS
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
35–45
Language:
URL:
https://aclanthology.org/2025.cltw-1.4/
DOI:
Bibkey:
Cite (ACL):
Liam Lonergan, Ibon Saratxaga, John Sloan, Oscar Maharg Bravo, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, and Ailbhe Ní Chasaide. 2025. Fotheidil: an Automatic Transcription System for the Irish Language. In Proceedings of the 5th Celtic Language Technology Workshop, pages 35–45, Abu Dhabi [Virtual Workshop]. International Committee on Computational Linguistics.
Cite (Informal):
Fotheidil: an Automatic Transcription System for the Irish Language (Lonergan et al., CLTW 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.cltw-1.4.pdf