Automatic Speech Recognition for Irish: the ABAIR-ÉIST System

Liam Lonergan, Mengjie Qian, Harald Berthelsen, Andy Murphy, Christoph Wendler, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide


Abstract
This paper describes ÉIST, automatic speech recogniser for Irish, developed as part of the ongoing ABAIR initiative, combining (1) acoustic models, (2) pronunciation lexicons and (3) language models into a hybrid system. A priority for now is a system that can deal with the multiple diverse native-speaker dialects. Consequently, (1) was built using predominately native-speaker speech, which included earlier recordings used for synthesis development as well as more diverse recordings obtained using the MíleGlór platform. The pronunciation variation across the dialects is a particular challenge in the development of (2) and is explored by testing both Trans-dialect and Multi-dialect letter-to-sound rules. Two approaches to language modelling (3) are used in the hybrid system, a simple n-gram model and recurrent neural network lattice rescoring, the latter garnering impressive performance improvements. The system is evaluated using a test set that is comprised of both native and non-native speakers, which allows for some inferences to be made on the performance of the system on both cohorts.
Anthology ID:
2022.cltw-1.7
Volume:
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Theodorus Fransen, William Lamb, Delyth Prys
Venue:
CLTW
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
47–51
Language:
URL:
https://aclanthology.org/2022.cltw-1.7
DOI:
Bibkey:
Cite (ACL):
Liam Lonergan, Mengjie Qian, Harald Berthelsen, Andy Murphy, Christoph Wendler, Neasa Ní Chiaráin, Christer Gobl, and Ailbhe Ní Chasaide. 2022. Automatic Speech Recognition for Irish: the ABAIR-ÉIST System. In Proceedings of the 4th Celtic Language Technology Workshop within LREC2022, pages 47–51, Marseille, France. European Language Resources Association.
Cite (Informal):
Automatic Speech Recognition for Irish: the ABAIR-ÉIST System (Lonergan et al., CLTW 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.cltw-1.7.pdf