eSTÓR: Curating Irish Datasets for Machine Translation
Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, Brian Davis
Correct Metadata for
Abstract
Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.- Anthology ID:
- 2025.mtsummit-2.28
- Volume:
- Proceedings of Machine Translation Summit XX: Volume 2
- Month:
- June
- Year:
- 2025
- Address:
- Geneva, Switzerland
- Editors:
- Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Samuel Läubli, Martin Volk, Miquel Esplà-Gomis, Vincent Vandeghinste, Helena Moniz, Sara Szoc
- Venue:
- MTSummit
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 115–116
- Language:
- URL:
- https://aclanthology.org/2025.mtsummit-2.28/
- DOI:
- Bibkey:
- Cite (ACL):
- Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, and Brian Davis. 2025. eSTÓR: Curating Irish Datasets for Machine Translation. In Proceedings of Machine Translation Summit XX: Volume 2, pages 115–116, Geneva, Switzerland. European Association for Machine Translation.
- Cite (Informal):
- eSTÓR: Curating Irish Datasets for Machine Translation (Walsh et al., MTSummit 2025)
- Copy Citation:
- PDF:
- https://aclanthology.org/2025.mtsummit-2.28.pdf
Export citation
@inproceedings{walsh-etal-2025-estor,
title = "e{ST{\'O}R}: Curating {I}rish Datasets for Machine Translation",
author = "Walsh, Abigail and
Loinsigh, {\'O}rla N{\'i} and
Adkins, Jane and
O{'}Connell, Ornait and
Andrade, Mark and
Clifford, Teresa and
Gaspari, Federico and
Dunne, Jane and
Davis, Brian",
editor = {Bouillon, Pierrette and
Gerlach, Johanna and
Girletti, Sabrina and
Volkart, Lise and
Rubino, Raphael and
Sennrich, Rico and
L{\"a}ubli, Samuel and
Volk, Martin and
Espl{\`a}-Gomis, Miquel and
Vandeghinste, Vincent and
Moniz, Helena and
Szoc, Sara},
booktitle = "Proceedings of Machine Translation Summit XX: Volume 2",
month = jun,
year = "2025",
address = "Geneva, Switzerland",
publisher = "European Association for Machine Translation",
url = "https://aclanthology.org/2025.mtsummit-2.28/",
pages = "115--116",
ISBN = "978-2-9701897-1-8",
abstract = "Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eST{\'O}R project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="walsh-etal-2025-estor">
<titleInfo>
<title>eSTÓR: Curating Irish Datasets for Machine Translation</title>
</titleInfo>
<name type="personal">
<namePart type="given">Abigail</namePart>
<namePart type="family">Walsh</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Órla</namePart>
<namePart type="given">Ní</namePart>
<namePart type="family">Loinsigh</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jane</namePart>
<namePart type="family">Adkins</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ornait</namePart>
<namePart type="family">O’Connell</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mark</namePart>
<namePart type="family">Andrade</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Teresa</namePart>
<namePart type="family">Clifford</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Federico</namePart>
<namePart type="family">Gaspari</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jane</namePart>
<namePart type="family">Dunne</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Brian</namePart>
<namePart type="family">Davis</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2025-06</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of Machine Translation Summit XX: Volume 2</title>
</titleInfo>
<name type="personal">
<namePart type="given">Pierrette</namePart>
<namePart type="family">Bouillon</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Johanna</namePart>
<namePart type="family">Gerlach</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sabrina</namePart>
<namePart type="family">Girletti</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Lise</namePart>
<namePart type="family">Volkart</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Raphael</namePart>
<namePart type="family">Rubino</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Rico</namePart>
<namePart type="family">Sennrich</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Samuel</namePart>
<namePart type="family">Läubli</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Martin</namePart>
<namePart type="family">Volk</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Miquel</namePart>
<namePart type="family">Esplà-Gomis</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Vincent</namePart>
<namePart type="family">Vandeghinste</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Helena</namePart>
<namePart type="family">Moniz</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sara</namePart>
<namePart type="family">Szoc</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>European Association for Machine Translation</publisher>
<place>
<placeTerm type="text">Geneva, Switzerland</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
<identifier type="isbn">978-2-9701897-1-8</identifier>
</relatedItem>
<abstract>Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.</abstract>
<identifier type="citekey">walsh-etal-2025-estor</identifier>
<location>
<url>https://aclanthology.org/2025.mtsummit-2.28/</url>
</location>
<part>
<date>2025-06</date>
<extent unit="page">
<start>115</start>
<end>116</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings %T eSTÓR: Curating Irish Datasets for Machine Translation %A Walsh, Abigail %A Loinsigh, Órla Ní %A Adkins, Jane %A O’Connell, Ornait %A Andrade, Mark %A Clifford, Teresa %A Gaspari, Federico %A Dunne, Jane %A Davis, Brian %Y Bouillon, Pierrette %Y Gerlach, Johanna %Y Girletti, Sabrina %Y Volkart, Lise %Y Rubino, Raphael %Y Sennrich, Rico %Y Läubli, Samuel %Y Volk, Martin %Y Esplà-Gomis, Miquel %Y Vandeghinste, Vincent %Y Moniz, Helena %Y Szoc, Sara %S Proceedings of Machine Translation Summit XX: Volume 2 %D 2025 %8 June %I European Association for Machine Translation %C Geneva, Switzerland %@ 978-2-9701897-1-8 %F walsh-etal-2025-estor %X Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains. %U https://aclanthology.org/2025.mtsummit-2.28/ %P 115-116
Markdown (Informal)
[eSTÓR: Curating Irish Datasets for Machine Translation](https://aclanthology.org/2025.mtsummit-2.28/) (Walsh et al., MTSummit 2025)
- eSTÓR: Curating Irish Datasets for Machine Translation (Walsh et al., MTSummit 2025)
ACL
- Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, and Brian Davis. 2025. eSTÓR: Curating Irish Datasets for Machine Translation. In Proceedings of Machine Translation Summit XX: Volume 2, pages 115–116, Geneva, Switzerland. European Association for Machine Translation.