Promoting Target Data in Context-aware Neural Machine Translation

Harritxu Gete, Thierry Etchegoyhen


Abstract
Standard context-aware neural machine translation (NMT) typically relies on parallel document-level data, exploiting both source and target contexts. Concatenation-based approaches in particular, still a strong baseline for document-level NMT, prepend source and/or target context sentences to the sentences to be translated, with model variants that exploit equal amounts of source and target data on each side achieving state-of-the-art results. In this work, we investigate whether target data should be further promoted within standard concatenation-based approaches, as most document-level phenomena rely on information that is present on the target language side. We evaluate novel concatenation-based variants where the target context is prepended to the source language, either in isolation or in combination with the source context. Experimental results in English-Russian and Basque-Spanish show that including target context in the source leads to large improvements on target language phenomena. On source-dependent phenomena, using only target language context in the source achieves parity with state-of-the-art concatenation approaches, or slightly underperforms, whereas combining source and target context on the source side leads to significant gains across the board.
Anthology ID:
2024.eamt-1.6
Volume:
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
Month:
June
Year:
2024
Address:
Sheffield, UK
Editors:
Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation (EAMT)
Note:
Pages:
9–23
Language:
URL:
https://aclanthology.org/2024.eamt-1.6
DOI:
Bibkey:
Cite (ACL):
Harritxu Gete and Thierry Etchegoyhen. 2024. Promoting Target Data in Context-aware Neural Machine Translation. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 9–23, Sheffield, UK. European Association for Machine Translation (EAMT).
Cite (Informal):
Promoting Target Data in Context-aware Neural Machine Translation (Gete & Etchegoyhen, EAMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eamt-1.6.pdf