A Comparison of Sentence-Weighting Techniques for NMT

Simon Rieß, Matthias Huck, Alex Fraser


Abstract
Sentence weighting is a simple and powerful domain adaptation technique. We carry out domain classification for computing sentence weights with 1) language model cross entropy difference 2) a convolutional neural network 3) a Recursive Neural Tensor Network. We compare these approaches with regard to domain classification accuracy and and study the posterior probability distributions. Then we carry out NMT experiments in the scenario where we have no in-domain parallel corpora and and only very limited in-domain monolingual corpora. Here and we use the domain classifier to reweight the sentences of our out-of-domain training corpus. This leads to improvements of up to 2.1 BLEU for German to English translation.
Anthology ID:
2021.mtsummit-research.15
Volume:
Proceedings of Machine Translation Summit XVIII: Research Track
Month:
August
Year:
2021
Address:
Virtual
Editors:
Kevin Duh, Francisco Guzmán
Venue:
MTSummit
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
176–187
Language:
URL:
https://aclanthology.org/2021.mtsummit-research.15
DOI:
Bibkey:
Cite (ACL):
Simon Rieß, Matthias Huck, and Alex Fraser. 2021. A Comparison of Sentence-Weighting Techniques for NMT. In Proceedings of Machine Translation Summit XVIII: Research Track, pages 176–187, Virtual. Association for Machine Translation in the Americas.
Cite (Informal):
A Comparison of Sentence-Weighting Techniques for NMT (Rieß et al., MTSummit 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.mtsummit-research.15.pdf