Mengjie Qian


2025

pdf bib
Fotheidil: an Automatic Transcription System for the Irish Language
Liam Lonergan | Ibon Saratxaga | John Sloan | Oscar Maharg Bravo | Mengjie Qian | Neasa Ní Chiaráin | Christer Gobl | Ailbhe Ní Chasaide
Proceedings of the 5th Celtic Language Technology Workshop

This paper sets out the first web-based transcription system for the Irish language - Fotheidil, a system that utilises speech-related AI technologies as part of the ABAIR initiative. The system includes both off-the-shelf pre-trained voice activity detection and speaker diarisation models and models trained specifically for Irish automatic speech recognition and capitalisation and punctuation restoration. Semi-supervised learning is explored to improve the acoustic model of a modular TDNN-HMM ASR system, yielding substantial improvements for out-of-domain test sets and dialects that are underrepresented in the supervised training set. A novel approach to capitalisation and punctuation restoration involving sequence-to-sequence models is compared with the conventional approach using a classification model. Experimental results show here also substantial improvements in performance. It is intended that will be made freely available for public use, and represents an important resource researchers and others who transcribe Irish language materials. Human-corrected transcriptions will be collected and included in the training dataset as the system is used, which should lead to incremental improvements to the ASR model in a cyclical, community-driven fashion.

2022

pdf bib
Automatic Speech Recognition for Irish: the ABAIR-ÉIST System
Liam Lonergan | Mengjie Qian | Harald Berthelsen | Andy Murphy | Christoph Wendler | Neasa Ní Chiaráin | Christer Gobl | Ailbhe Ní Chasaide
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022

This paper describes ÉIST, automatic speech recogniser for Irish, developed as part of the ongoing ABAIR initiative, combining (1) acoustic models, (2) pronunciation lexicons and (3) language models into a hybrid system. A priority for now is a system that can deal with the multiple diverse native-speaker dialects. Consequently, (1) was built using predominately native-speaker speech, which included earlier recordings used for synthesis development as well as more diverse recordings obtained using the MíleGlór platform. The pronunciation variation across the dialects is a particular challenge in the development of (2) and is explored by testing both Trans-dialect and Multi-dialect letter-to-sound rules. Two approaches to language modelling (3) are used in the hybrid system, a simple n-gram model and recurrent neural network lattice rescoring, the latter garnering impressive performance improvements. The system is evaluated using a test set that is comprised of both native and non-native speakers, which allows for some inferences to be made on the performance of the system on both cohorts.