Towards a Better Evaluation of Metrics for Machine Translation

Peter Stanchev; Weiyue Wang; Hermann Ney

doi:10.18653/v1/2020.wmt-1.103

Towards a Better Evaluation of Metrics for Machine Translation

Peter Stanchev, Weiyue Wang, Hermann Ney

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

An important aspect of machine translation is its evaluation, which can be achieved through the use of a variety of metrics. To compare these metrics, the workshop on statistical machine translation annually evaluates metrics based on their correlation with human judgement. Over the years, methods for measuring correlation with humans have changed, but little research has been performed on what the optimal methods for acquiring human scores are and how human correlation can be measured. In this work, the methods for evaluating metrics at both system- and segment-level are analyzed in detail and their shortcomings are pointed out.

Anthology ID:: 2020.wmt-1.103
Volume:: Proceedings of the Fifth Conference on Machine Translation
Month:: November
Year:: 2020
Address:: Online
Editors:: Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 928–933
Language:
URL:: https://aclanthology.org/2020.wmt-1.103/
DOI:: 10.18653/v1/2020.wmt-1.103
Bibkey:
Cite (ACL):: Peter Stanchev, Weiyue Wang, and Hermann Ney. 2020. Towards a Better Evaluation of Metrics for Machine Translation. In Proceedings of the Fifth Conference on Machine Translation, pages 928–933, Online. Association for Computational Linguistics.
Cite (Informal):: Towards a Better Evaluation of Metrics for Machine Translation (Stanchev et al., WMT 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.wmt-1.103.pdf
Video:: https://slideslive.com/38939548

PDF Cite Search Video Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{stanchev-etal-2020-towards,
    title = "Towards a Better Evaluation of Metrics for Machine Translation",
    author = "Stanchev, Peter  and
      Wang, Weiyue  and
      Ney, Hermann",
    editor = {Barrault, Lo{\"i}c  and
      Bojar, Ond{\v{r}}ej  and
      Bougares, Fethi  and
      Chatterjee, Rajen  and
      Costa-juss{\`a}, Marta R.  and
      Federmann, Christian  and
      Fishel, Mark  and
      Fraser, Alexander  and
      Graham, Yvette  and
      Guzman, Paco  and
      Haddow, Barry  and
      Huck, Matthias  and
      Yepes, Antonio Jimeno  and
      Koehn, Philipp  and
      Martins, Andr{\'e}  and
      Morishita, Makoto  and
      Monz, Christof  and
      Nagata, Masaaki  and
      Nakazawa, Toshiaki  and
      Negri, Matteo},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.103/",
    doi = "10.18653/v1/2020.wmt-1.103",
    pages = "928--933",
    abstract = "An important aspect of machine translation is its evaluation, which can be achieved through the use of a variety of metrics. To compare these metrics, the workshop on statistical machine translation annually evaluates metrics based on their correlation with human judgement. Over the years, methods for measuring correlation with humans have changed, but little research has been performed on what the optimal methods for acquiring human scores are and how human correlation can be measured. In this work, the methods for evaluating metrics at both system- and segment-level are analyzed in detail and their shortcomings are pointed out."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="stanchev-etal-2020-towards">
    <titleInfo>
        <title>Towards a Better Evaluation of Metrics for Machine Translation</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Peter</namePart>
        <namePart type="family">Stanchev</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Weiyue</namePart>
        <namePart type="family">Wang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Hermann</namePart>
        <namePart type="family">Ney</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-11</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Fifth Conference on Machine Translation</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Loïc</namePart>
            <namePart type="family">Barrault</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ondřej</namePart>
            <namePart type="family">Bojar</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Fethi</namePart>
            <namePart type="family">Bougares</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Rajen</namePart>
            <namePart type="family">Chatterjee</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marta</namePart>
            <namePart type="given">R</namePart>
            <namePart type="family">Costa-jussà</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Christian</namePart>
            <namePart type="family">Federmann</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mark</namePart>
            <namePart type="family">Fishel</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Alexander</namePart>
            <namePart type="family">Fraser</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yvette</namePart>
            <namePart type="family">Graham</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Paco</namePart>
            <namePart type="family">Guzman</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Barry</namePart>
            <namePart type="family">Haddow</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Matthias</namePart>
            <namePart type="family">Huck</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Antonio</namePart>
            <namePart type="given">Jimeno</namePart>
            <namePart type="family">Yepes</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Philipp</namePart>
            <namePart type="family">Koehn</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">André</namePart>
            <namePart type="family">Martins</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Makoto</namePart>
            <namePart type="family">Morishita</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Christof</namePart>
            <namePart type="family">Monz</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Masaaki</namePart>
            <namePart type="family">Nagata</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Toshiaki</namePart>
            <namePart type="family">Nakazawa</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Matteo</namePart>
            <namePart type="family">Negri</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Online</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>An important aspect of machine translation is its evaluation, which can be achieved through the use of a variety of metrics. To compare these metrics, the workshop on statistical machine translation annually evaluates metrics based on their correlation with human judgement. Over the years, methods for measuring correlation with humans have changed, but little research has been performed on what the optimal methods for acquiring human scores are and how human correlation can be measured. In this work, the methods for evaluating metrics at both system- and segment-level are analyzed in detail and their shortcomings are pointed out.</abstract>
    <identifier type="citekey">stanchev-etal-2020-towards</identifier>
    <identifier type="doi">10.18653/v1/2020.wmt-1.103</identifier>
    <location>
        <url>https://aclanthology.org/2020.wmt-1.103/</url>
    </location>
    <part>
        <date>2020-11</date>
        <extent unit="page">
            <start>928</start>
            <end>933</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Towards a Better Evaluation of Metrics for Machine Translation
%A Stanchev, Peter
%A Wang, Weiyue
%A Ney, Hermann
%Y Barrault, Loïc
%Y Bojar, Ondřej
%Y Bougares, Fethi
%Y Chatterjee, Rajen
%Y Costa-jussà, Marta R.
%Y Federmann, Christian
%Y Fishel, Mark
%Y Fraser, Alexander
%Y Graham, Yvette
%Y Guzman, Paco
%Y Haddow, Barry
%Y Huck, Matthias
%Y Yepes, Antonio Jimeno
%Y Koehn, Philipp
%Y Martins, André
%Y Morishita, Makoto
%Y Monz, Christof
%Y Nagata, Masaaki
%Y Nakazawa, Toshiaki
%Y Negri, Matteo
%S Proceedings of the Fifth Conference on Machine Translation
%D 2020
%8 November
%I Association for Computational Linguistics
%C Online
%F stanchev-etal-2020-towards
%X An important aspect of machine translation is its evaluation, which can be achieved through the use of a variety of metrics. To compare these metrics, the workshop on statistical machine translation annually evaluates metrics based on their correlation with human judgement. Over the years, methods for measuring correlation with humans have changed, but little research has been performed on what the optimal methods for acquiring human scores are and how human correlation can be measured. In this work, the methods for evaluating metrics at both system- and segment-level are analyzed in detail and their shortcomings are pointed out.
%R 10.18653/v1/2020.wmt-1.103
%U https://aclanthology.org/2020.wmt-1.103/
%U https://doi.org/10.18653/v1/2020.wmt-1.103
%P 928-933

Download as File

Markdown (Informal)

[Towards a Better Evaluation of Metrics for Machine Translation](https://aclanthology.org/2020.wmt-1.103/) (Stanchev et al., WMT 2020)

Towards a Better Evaluation of Metrics for Machine Translation (Stanchev et al., WMT 2020)

ACL

Peter Stanchev, Weiyue Wang, and Hermann Ney. 2020. Towards a Better Evaluation of Metrics for Machine Translation. In Proceedings of the Fifth Conference on Machine Translation, pages 928–933, Online. Association for Computational Linguistics.