PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation

Liane Guillou; Christian Hardmeier

PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation

Abstract

We present PROTEST, a test suite for the evaluation of pronoun translation by MT systems. The test suite comprises 250 hand-selected pronoun tokens and an automatic evaluation method which compares the translations of pronouns in MT output with those in the reference translation. Pronoun translations that do not match the reference are referred for manual evaluation. PROTEST is designed to support analysis of system performance at the level of individual pronoun groups, rather than to provide a single aggregate measure over all pronouns. We wish to encourage detailed analyses to highlight issues in the handling of specific linguistic mechanisms by MT systems, thereby contributing to a better understanding of those problems involved in translating pronouns. We present two use cases for PROTEST: a) for measuring improvement/degradation of an incremental system change, and b) for comparing the performance of a group of systems whose design may be largely unrelated. Following the latter use case, we demonstrate the application of PROTEST to the evaluation of the systems submitted to the DiscoMT 2015 shared task on pronoun translation.

Anthology ID:: L16-1100
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 636–643
Language:
URL:: https://aclanthology.org/L16-1100/
DOI:
Bibkey:
Cite (ACL):: Liane Guillou and Christian Hardmeier. 2016. PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 636–643, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation (Guillou & Hardmeier, LREC 2016)
Copy Citation:
PDF:: https://aclanthology.org/L16-1100.pdf

PDF Cite Search Fix data