Correct Me If You Can: Learning from Error Corrections and Markings

Julia Kreutzer, Nathaniel Berger, Stefan Riezler


Abstract
Sequence-to-sequence learning involves a trade-off between signal strength and annotation cost of training data. For example, machine translation data range from costly expert-generated translations that enable supervised learning, to weak quality-judgment feedback that facilitate reinforcement learning. We present the first user study on annotation cost and machine learnability for the less popular annotation mode of error markings. We show that error markings for translations of TED talks from English to German allow precise credit assignment while requiring significantly less human effort than correcting/post-editing, and that error-marked data can be used successfully to fine-tune neural machine translation models.
Anthology ID:
2020.eamt-1.15
Volume:
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
Month:
November
Year:
2020
Address:
Lisboa, Portugal
Editors:
André Martins, Helena Moniz, Sara Fumega, Bruno Martins, Fernando Batista, Luisa Coheur, Carla Parra, Isabel Trancoso, Marco Turchi, Arianna Bisazza, Joss Moorkens, Ana Guerberof, Mary Nurminen, Lena Marg, Mikel L. Forcada
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
135–144
Language:
URL:
https://aclanthology.org/2020.eamt-1.15
DOI:
Bibkey:
Cite (ACL):
Julia Kreutzer, Nathaniel Berger, and Stefan Riezler. 2020. Correct Me If You Can: Learning from Error Corrections and Markings. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 135–144, Lisboa, Portugal. European Association for Machine Translation.
Cite (Informal):
Correct Me If You Can: Learning from Error Corrections and Markings (Kreutzer et al., EAMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.eamt-1.15.pdf
Code
 StatNLP/mt-correct-mark-interface