eSTÓR: Curating Irish Datasets for Machine Translation

Abigail Walsh; Órla Ní Loinsigh; Jane Adkins; Ornait O’Connell; Mark Andrade; Teresa Clifford; Federico Gaspari; Jane Dunne; Brian Davis

eSTÓR: Curating Irish Datasets for Machine Translation

Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, Brian Davis

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.

Anthology ID:: 2025.mtsummit-2.28
Volume:: Proceedings of Machine Translation Summit XX: Volume 2
Month:: June
Year:: 2025
Address:: Geneva, Switzerland
Editors:: Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Samuel Läubli, Martin Volk, Miquel Esplà-Gomis, Vincent Vandeghinste, Helena Moniz, Sara Szoc
Venue:: MTSummit
SIG:
Publisher:: European Association for Machine Translation
Note:
Pages:: 115–116
Language:
URL:: https://aclanthology.org/2025.mtsummit-2.28/
DOI:
Bibkey:
Cite (ACL):: Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, and Brian Davis. 2025. eSTÓR: Curating Irish Datasets for Machine Translation. In Proceedings of Machine Translation Summit XX: Volume 2, pages 115–116, Geneva, Switzerland. European Association for Machine Translation.
Cite (Informal):: eSTÓR: Curating Irish Datasets for Machine Translation (Walsh et al., MTSummit 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.mtsummit-2.28.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{walsh-etal-2025-estor,
    title = "e{ST{\'O}R}: Curating {I}rish Datasets for Machine Translation",
    author = "Walsh, Abigail  and
      Loinsigh, {\'O}rla N{\'i}  and
      Adkins, Jane  and
      O{'}Connell, Ornait  and
      Andrade, Mark  and
      Clifford, Teresa  and
      Gaspari, Federico  and
      Dunne, Jane  and
      Davis, Brian",
    editor = {Bouillon, Pierrette  and
      Gerlach, Johanna  and
      Girletti, Sabrina  and
      Volkart, Lise  and
      Rubino, Raphael  and
      Sennrich, Rico  and
      L{\"a}ubli, Samuel  and
      Volk, Martin  and
      Espl{\`a}-Gomis, Miquel  and
      Vandeghinste, Vincent  and
      Moniz, Helena  and
      Szoc, Sara},
    booktitle = "Proceedings of Machine Translation Summit XX: Volume 2",
    month = jun,
    year = "2025",
    address = "Geneva, Switzerland",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2025.mtsummit-2.28/",
    pages = "115--116",
    ISBN = "978-2-9701897-1-8",
    abstract = "Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eST{\'O}R project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="walsh-etal-2025-estor">
    <titleInfo>
        <title>eSTÓR: Curating Irish Datasets for Machine Translation</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Abigail</namePart>
        <namePart type="family">Walsh</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Órla</namePart>
        <namePart type="given">Ní</namePart>
        <namePart type="family">Loinsigh</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jane</namePart>
        <namePart type="family">Adkins</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ornait</namePart>
        <namePart type="family">O’Connell</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Mark</namePart>
        <namePart type="family">Andrade</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Teresa</namePart>
        <namePart type="family">Clifford</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Federico</namePart>
        <namePart type="family">Gaspari</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jane</namePart>
        <namePart type="family">Dunne</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Brian</namePart>
        <namePart type="family">Davis</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2025-06</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of Machine Translation Summit XX: Volume 2</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Pierrette</namePart>
            <namePart type="family">Bouillon</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Johanna</namePart>
            <namePart type="family">Gerlach</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Sabrina</namePart>
            <namePart type="family">Girletti</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Lise</namePart>
            <namePart type="family">Volkart</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Raphael</namePart>
            <namePart type="family">Rubino</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Rico</namePart>
            <namePart type="family">Sennrich</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Samuel</namePart>
            <namePart type="family">Läubli</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Martin</namePart>
            <namePart type="family">Volk</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Miquel</namePart>
            <namePart type="family">Esplà-Gomis</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Vincent</namePart>
            <namePart type="family">Vandeghinste</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Helena</namePart>
            <namePart type="family">Moniz</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Sara</namePart>
            <namePart type="family">Szoc</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>European Association for Machine Translation</publisher>
            <place>
                <placeTerm type="text">Geneva, Switzerland</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">978-2-9701897-1-8</identifier>
    </relatedItem>
    <abstract>Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.</abstract>
    <identifier type="citekey">walsh-etal-2025-estor</identifier>
    <location>
        <url>https://aclanthology.org/2025.mtsummit-2.28/</url>
    </location>
    <part>
        <date>2025-06</date>
        <extent unit="page">
            <start>115</start>
            <end>116</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T eSTÓR: Curating Irish Datasets for Machine Translation
%A Walsh, Abigail
%A Loinsigh, Órla Ní
%A Adkins, Jane
%A O’Connell, Ornait
%A Andrade, Mark
%A Clifford, Teresa
%A Gaspari, Federico
%A Dunne, Jane
%A Davis, Brian
%Y Bouillon, Pierrette
%Y Gerlach, Johanna
%Y Girletti, Sabrina
%Y Volkart, Lise
%Y Rubino, Raphael
%Y Sennrich, Rico
%Y Läubli, Samuel
%Y Volk, Martin
%Y Esplà-Gomis, Miquel
%Y Vandeghinste, Vincent
%Y Moniz, Helena
%Y Szoc, Sara
%S Proceedings of Machine Translation Summit XX: Volume 2
%D 2025
%8 June
%I European Association for Machine Translation
%C Geneva, Switzerland
%@ 978-2-9701897-1-8
%F walsh-etal-2025-estor
%X Minority languages such as Irish are massively under-resourced, particularly in terms of high-quality domain-relevant data, limiting the capabilities of machine translation (MT) engines, even those integrating large language models (LLMs). The eSTÓR project, described in this paper, focuses on the collection and curation of high-quality Irish text data for diverse domains.
%U https://aclanthology.org/2025.mtsummit-2.28/
%P 115-116

Download as File

Markdown (Informal)

[eSTÓR: Curating Irish Datasets for Machine Translation](https://aclanthology.org/2025.mtsummit-2.28/) (Walsh et al., MTSummit 2025)

eSTÓR: Curating Irish Datasets for Machine Translation (Walsh et al., MTSummit 2025)

ACL

Abigail Walsh, Órla Ní Loinsigh, Jane Adkins, Ornait O’Connell, Mark Andrade, Teresa Clifford, Federico Gaspari, Jane Dunne, and Brian Davis. 2025. eSTÓR: Curating Irish Datasets for Machine Translation. In Proceedings of Machine Translation Summit XX: Volume 2, pages 115–116, Geneva, Switzerland. European Association for Machine Translation.