Improving Domain Robustness in Neural Machine Translation with Fused Topic Knowledge Embeddings

Danai Xezonaki, Talaat Khalil, David Stap, Brandon Denis


Abstract
Domain robustness is a key challenge for Neural Machine Translation (NMT). Translating text from a different distribution than the training set requires the NMT models to generalize well to unseen domains. In this work we propose a novel way to address domain robustness, by fusing external topic knowledge into the NMT architecture. We employ a pretrained denoising autoencoder and fuse topic information into the system during continued pretraining, and finetuning of the model on the downstream NMT task. Our results show that incorporating external topic knowledge, as well as additional pretraining can improve the out-of-domain performance of NMT models. The proposed methodology meets state-of-the-art on out-of-domain performance. Our analysis shows that a low overlap between the pretraining and finetuning corpora, as well as the quality of topic representations help the NMT systems become more robust under domain shift.
Anthology ID:
2023.mtsummit-research.18
Volume:
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track
Month:
September
Year:
2023
Address:
Macau SAR, China
Editors:
Masao Utiyama, Rui Wang
Venue:
MTSummit
SIG:
Publisher:
Asia-Pacific Association for Machine Translation
Note:
Pages:
209–221
Language:
URL:
https://aclanthology.org/2023.mtsummit-research.18
DOI:
Bibkey:
Cite (ACL):
Danai Xezonaki, Talaat Khalil, David Stap, and Brandon Denis. 2023. Improving Domain Robustness in Neural Machine Translation with Fused Topic Knowledge Embeddings. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, pages 209–221, Macau SAR, China. Asia-Pacific Association for Machine Translation.
Cite (Informal):
Improving Domain Robustness in Neural Machine Translation with Fused Topic Knowledge Embeddings (Xezonaki et al., MTSummit 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.mtsummit-research.18.pdf