Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering

Philipp Koehn; Huda Khayrallah; Kenneth Heafield; Mikel L. Forcada

doi:10.18653/v1/W18-6453

Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering

Philipp Koehn, Huda Khayrallah, Kenneth Heafield, Mikel L. Forcada

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high-quality data to be used to train machine translation systems. Seventeen participants from companies, national research labs, and universities participated in this task.

Anthology ID:: W18-6453
Volume:: Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Month:: October
Year:: 2018
Address:: Belgium, Brussels
Editors:: Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 726–739
Language:
URL:: https://aclanthology.org/W18-6453/
DOI:: 10.18653/v1/W18-6453
Bibkey:
Cite (ACL):: Philipp Koehn, Huda Khayrallah, Kenneth Heafield, and Mikel L. Forcada. 2018. Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 726–739, Belgium, Brussels. Association for Computational Linguistics.
Cite (Informal):: Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering (Koehn et al., WMT 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-6453.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{koehn-etal-2018-findings,
    title = "Findings of the {WMT} 2018 Shared Task on Parallel Corpus Filtering",
    author = "Koehn, Philipp  and
      Khayrallah, Huda  and
      Heafield, Kenneth  and
      Forcada, Mikel L.",
    editor = "Bojar, Ond{\v{r}}ej  and
      Chatterjee, Rajen  and
      Federmann, Christian  and
      Fishel, Mark  and
      Graham, Yvette  and
      Haddow, Barry  and
      Huck, Matthias  and
      Yepes, Antonio Jimeno  and
      Koehn, Philipp  and
      Monz, Christof  and
      Negri, Matteo  and
      N{\'e}v{\'e}ol, Aur{\'e}lie  and
      Neves, Mariana  and
      Post, Matt  and
      Specia, Lucia  and
      Turchi, Marco  and
      Verspoor, Karin",
    booktitle = "Proceedings of the Third Conference on Machine Translation: Shared Task Papers",
    month = oct,
    year = "2018",
    address = "Belgium, Brussels",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W18-6453/",
    doi = "10.18653/v1/W18-6453",
    pages = "726--739",
    abstract = "We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1{\%} and 10{\%} of high-quality data to be used to train machine translation systems. Seventeen participants from companies, national research labs, and universities participated in this task."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="koehn-etal-2018-findings">
    <titleInfo>
        <title>Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Philipp</namePart>
        <namePart type="family">Koehn</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Huda</namePart>
        <namePart type="family">Khayrallah</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Kenneth</namePart>
        <namePart type="family">Heafield</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Mikel</namePart>
        <namePart type="given">L</namePart>
        <namePart type="family">Forcada</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2018-10</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Third Conference on Machine Translation: Shared Task Papers</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Ondřej</namePart>
            <namePart type="family">Bojar</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Rajen</namePart>
            <namePart type="family">Chatterjee</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Christian</namePart>
            <namePart type="family">Federmann</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mark</namePart>
            <namePart type="family">Fishel</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yvette</namePart>
            <namePart type="family">Graham</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Barry</namePart>
            <namePart type="family">Haddow</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Matthias</namePart>
            <namePart type="family">Huck</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Antonio</namePart>
            <namePart type="given">Jimeno</namePart>
            <namePart type="family">Yepes</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Philipp</namePart>
            <namePart type="family">Koehn</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Christof</namePart>
            <namePart type="family">Monz</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Matteo</namePart>
            <namePart type="family">Negri</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Aurélie</namePart>
            <namePart type="family">Névéol</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mariana</namePart>
            <namePart type="family">Neves</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Matt</namePart>
            <namePart type="family">Post</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Lucia</namePart>
            <namePart type="family">Specia</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marco</namePart>
            <namePart type="family">Turchi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Karin</namePart>
            <namePart type="family">Verspoor</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Belgium, Brussels</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high-quality data to be used to train machine translation systems. Seventeen participants from companies, national research labs, and universities participated in this task.</abstract>
    <identifier type="citekey">koehn-etal-2018-findings</identifier>
    <identifier type="doi">10.18653/v1/W18-6453</identifier>
    <location>
        <url>https://aclanthology.org/W18-6453/</url>
    </location>
    <part>
        <date>2018-10</date>
        <extent unit="page">
            <start>726</start>
            <end>739</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering
%A Koehn, Philipp
%A Khayrallah, Huda
%A Heafield, Kenneth
%A Forcada, Mikel L.
%Y Bojar, Ondřej
%Y Chatterjee, Rajen
%Y Federmann, Christian
%Y Fishel, Mark
%Y Graham, Yvette
%Y Haddow, Barry
%Y Huck, Matthias
%Y Yepes, Antonio Jimeno
%Y Koehn, Philipp
%Y Monz, Christof
%Y Negri, Matteo
%Y Névéol, Aurélie
%Y Neves, Mariana
%Y Post, Matt
%Y Specia, Lucia
%Y Turchi, Marco
%Y Verspoor, Karin
%S Proceedings of the Third Conference on Machine Translation: Shared Task Papers
%D 2018
%8 October
%I Association for Computational Linguistics
%C Belgium, Brussels
%F koehn-etal-2018-findings
%X We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high-quality data to be used to train machine translation systems. Seventeen participants from companies, national research labs, and universities participated in this task.
%R 10.18653/v1/W18-6453
%U https://aclanthology.org/W18-6453/
%U https://doi.org/10.18653/v1/W18-6453
%P 726-739

Download as File

Markdown (Informal)

[Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering](https://aclanthology.org/W18-6453/) (Koehn et al., WMT 2018)

Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering (Koehn et al., WMT 2018)

ACL

Philipp Koehn, Huda Khayrallah, Kenneth Heafield, and Mikel L. Forcada. 2018. Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 726–739, Belgium, Brussels. Association for Computational Linguistics.