Evaluation and Large-scale Training for Contextual Machine Translation

Matt Post, Marcin Junczys-Dowmunt


Abstract
Despite the fact that context is known to be vital for resolving a range of translation ambiguities, most traditional machine translation systems continue to be trained and to operate at the sentence level. A common explanation is the lack of document-level annotations for existing training data. This work investigates whether having such annotations would be helpful for training traditional MT systems at scale. We build large-scale, state-of-the-art contextual MT systems into German, French, and Russian, fixing the datasets while comparing the effect of sourcing contextual training samples from both parallel and back-translated data. We then evaluate these contextual models across a range of contextual test sets from the literature, where we find that (a) document annotations from both mined parallel and back-translated monolingual data are helpful, but that the best contextual MT systems do not draw contextual samples from the parallel data. We also make two points related to evaluation: (b) contrastive score-based metrics on challenge sets are not discriminative; instead, models must be tested directly on their ability to generate correct outputs, and (c) standard corpus-level metrics such as COMET work best in settings that are dense in contextual phenomena.
Anthology ID:
2024.wmt-1.112
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1125–1139
Language:
URL:
https://aclanthology.org/2024.wmt-1.112
DOI:
Bibkey:
Cite (ACL):
Matt Post and Marcin Junczys-Dowmunt. 2024. Evaluation and Large-scale Training for Contextual Machine Translation. In Proceedings of the Ninth Conference on Machine Translation, pages 1125–1139, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Evaluation and Large-scale Training for Contextual Machine Translation (Post & Junczys-Dowmunt, WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.112.pdf