A Corpus of Machine Translation Errors Extracted from Translation Students Exercises

Guillaume Wisniewski, Natalie Kübler, François Yvon


Abstract
In this paper, we present a freely available corpus of automatic translations accompanied with post-edited versions, annotated with labels identifying the different kinds of errors made by the MT system. These data have been extracted from translation students exercises that have been corrected by a senior professor. This corpus can be useful for training quality estimation tools and for analyzing the types of errors made MT system.
Anthology ID:
L14-1085
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3585–3588
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1115_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Guillaume Wisniewski, Natalie Kübler, and François Yvon. 2014. A Corpus of Machine Translation Errors Extracted from Translation Students Exercises. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3585–3588, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
A Corpus of Machine Translation Errors Extracted from Translation Students Exercises (Wisniewski et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1115_Paper.pdf