eSTÓR: Curating Irish Datasets for Machine Translation
Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, Brian Davis
Correct Metadata for
Abstract
Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.- Anthology ID:
- 2025.mtsummit-2.28
- Volume:
- Proceedings of Machine Translation Summit XX: Volume 2
- Month:
- June
- Year:
- 2025
- Address:
- Geneva, Switzerland
- Editors:
- Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Samuel Läubli, Martin Volk, Miquel Esplà-Gomis, Vincent Vandeghinste, Helena Moniz, Sara Szoc
- Venue:
- MTSummit
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 115–116
- Language:
- URL:
- https://aclanthology.org/2025.mtsummit-2.28/
- DOI:
- Bibkey:
- Cite (ACL):
- Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, and Brian Davis. 2025. eSTÓR: Curating Irish Datasets for Machine Translation. In Proceedings of Machine Translation Summit XX: Volume 2, pages 115–116, Geneva, Switzerland. European Association for Machine Translation.
- Cite (Informal):
- eSTÓR: Curating Irish Datasets for Machine Translation (Walsh et al., MTSummit 2025)
- Copy Citation:
- PDF:
- https://aclanthology.org/2025.mtsummit-2.28.pdf
Export citation
@inproceedings{walsh-etal-2025-estor, title = "e{ST{\'O}R}: Curating {I}rish Datasets for Machine Translation", author = "Walsh, Abigail and Loinsigh, {\'O}rla N{\'i} and Adkins, Jane and O{'}Connell, Ornait and Andrade, Mark and Clifford, Teresa and Gaspari, Federico and Dunne, Jane and Davis, Brian", editor = {Bouillon, Pierrette and Gerlach, Johanna and Girletti, Sabrina and Volkart, Lise and Rubino, Raphael and Sennrich, Rico and L{\"a}ubli, Samuel and Volk, Martin and Espl{\`a}-Gomis, Miquel and Vandeghinste, Vincent and Moniz, Helena and Szoc, Sara}, booktitle = "Proceedings of Machine Translation Summit XX: Volume 2", month = jun, year = "2025", address = "Geneva, Switzerland", publisher = "European Association for Machine Translation", url = "https://aclanthology.org/2025.mtsummit-2.28/", pages = "115--116", ISBN = "978-2-9701897-1-8", abstract = "Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eST{\'O}R project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains." }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="walsh-etal-2025-estor"> <titleInfo> <title>eSTÓR: Curating Irish Datasets for Machine Translation</title> </titleInfo> <name type="personal"> <namePart type="given">Abigail</namePart> <namePart type="family">Walsh</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Órla</namePart> <namePart type="given">Ní</namePart> <namePart type="family">Loinsigh</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jane</namePart> <namePart type="family">Adkins</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ornait</namePart> <namePart type="family">O’Connell</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mark</namePart> <namePart type="family">Andrade</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Teresa</namePart> <namePart type="family">Clifford</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Federico</namePart> <namePart type="family">Gaspari</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jane</namePart> <namePart type="family">Dunne</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Brian</namePart> <namePart type="family">Davis</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2025-06</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of Machine Translation Summit XX: Volume 2</title> </titleInfo> <name type="personal"> <namePart type="given">Pierrette</namePart> <namePart type="family">Bouillon</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Johanna</namePart> <namePart type="family">Gerlach</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sabrina</namePart> <namePart type="family">Girletti</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Lise</namePart> <namePart type="family">Volkart</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Raphael</namePart> <namePart type="family">Rubino</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Rico</namePart> <namePart type="family">Sennrich</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Samuel</namePart> <namePart type="family">Läubli</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Martin</namePart> <namePart type="family">Volk</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Miquel</namePart> <namePart type="family">Esplà-Gomis</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Vincent</namePart> <namePart type="family">Vandeghinste</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Helena</namePart> <namePart type="family">Moniz</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sara</namePart> <namePart type="family">Szoc</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>European Association for Machine Translation</publisher> <place> <placeTerm type="text">Geneva, Switzerland</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> <identifier type="isbn">978-2-9701897-1-8</identifier> </relatedItem> <abstract>Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.</abstract> <identifier type="citekey">walsh-etal-2025-estor</identifier> <location> <url>https://aclanthology.org/2025.mtsummit-2.28/</url> </location> <part> <date>2025-06</date> <extent unit="page"> <start>115</start> <end>116</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T eSTÓR: Curating Irish Datasets for Machine Translation %A Walsh, Abigail %A Loinsigh, Órla Ní %A Adkins, Jane %A O’Connell, Ornait %A Andrade, Mark %A Clifford, Teresa %A Gaspari, Federico %A Dunne, Jane %A Davis, Brian %Y Bouillon, Pierrette %Y Gerlach, Johanna %Y Girletti, Sabrina %Y Volkart, Lise %Y Rubino, Raphael %Y Sennrich, Rico %Y Läubli, Samuel %Y Volk, Martin %Y Esplà-Gomis, Miquel %Y Vandeghinste, Vincent %Y Moniz, Helena %Y Szoc, Sara %S Proceedings of Machine Translation Summit XX: Volume 2 %D 2025 %8 June %I European Association for Machine Translation %C Geneva, Switzerland %@ 978-2-9701897-1-8 %F walsh-etal-2025-estor %X Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains. %U https://aclanthology.org/2025.mtsummit-2.28/ %P 115-116
Markdown (Informal)
[eSTÓR: Curating Irish Datasets for Machine Translation](https://aclanthology.org/2025.mtsummit-2.28/) (Walsh et al., MTSummit 2025)
- eSTÓR: Curating Irish Datasets for Machine Translation (Walsh et al., MTSummit 2025)
ACL
- Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, and Brian Davis. 2025. eSTÓR: Curating Irish Datasets for Machine Translation. In Proceedings of Machine Translation Summit XX: Volume 2, pages 115–116, Geneva, Switzerland. European Association for Machine Translation.