eSTÓR: Curating Irish Datasets for Machine Translation

Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, Brian Davis


Abstract
Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.
Anthology ID:
2025.mtsummit-2.28
Volume:
Proceedings of Machine Translation Summit XX: Volume 2
Month:
June
Year:
2025
Address:
Geneva, Switzerland
Editors:
Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Samuel Läubli, Martin Volk, Miquel Esplà-Gomis, Vincent Vandeghinste, Helena Moniz, Sara Szoc
Venue:
MTSummit
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
115–116
Language:
URL:
https://aclanthology.org/2025.mtsummit-2.28/
DOI:
Bibkey:
Cite (ACL):
Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, and Brian Davis. 2025. eSTÓR: Curating Irish Datasets for Machine Translation. In Proceedings of Machine Translation Summit XX: Volume 2, pages 115–116, Geneva, Switzerland. European Association for Machine Translation.
Cite (Informal):
eSTÓR: Curating Irish Datasets for Machine Translation (Walsh et al., MTSummit 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.mtsummit-2.28.pdf