An Evaluation of Source Factors in Concatenation-Based Context-Aware Neural Machine Translation

Harritxu Gete, Thierry Etchegoyhen


Abstract
We explore the use of source factors in context-aware neural machine translation, specifically concatenation-based models, to improve the translation quality of inter-sentential phenomena. Context sentences are typically concatenated to the sentence to be translated, with string-based markers to separate the latter from the former. Although previous studies have measured the impact of prefixes to identify and mark context information, the use of learnable factors has only been marginally explored. In this study, we evaluate the impact of single and multiple source context factors in English-German and Basque-Spanish contextual translation. We show that this type of factors can significantly enhance translation accuracy for phenomena such as gender and register coherence in Basque-Spanish, while also improving BLEU results in some scenarios. These results demonstrate the potential of factor-based context identification to improve context-aware machine translation in future research.
Anthology ID:
2023.ranlp-1.45
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
399–407
Language:
URL:
https://aclanthology.org/2023.ranlp-1.45
DOI:
Bibkey:
Cite (ACL):
Harritxu Gete and Thierry Etchegoyhen. 2023. An Evaluation of Source Factors in Concatenation-Based Context-Aware Neural Machine Translation. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 399–407, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
An Evaluation of Source Factors in Concatenation-Based Context-Aware Neural Machine Translation (Gete & Etchegoyhen, RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.45.pdf