Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation

Dario Stojanovski, Alexander Fraser


Abstract
Achieving satisfying performance in machine translation on domains for which there is no training data is challenging. Traditional supervised domain adaptation is not suitable for addressing such zero-resource domains because it relies on in-domain parallel data. We show that when in-domain parallel data is not available, access to document-level context enables better capturing of domain generalities compared to only having access to a single sentence. Having access to more information provides a more reliable domain estimation. We present two document-level Transformer models which are capable of using large context sizes and we compare these models against strong Transformer baselines. We obtain improvements for the two zero-resource domains we study. We additionally provide an analysis where we vary the amount of context and look at the case where in-domain data is available.
Anthology ID:
2021.adaptnlp-1.9
Volume:
Proceedings of the Second Workshop on Domain Adaptation for NLP
Month:
April
Year:
2021
Address:
Kyiv, Ukraine
Editors:
Eyal Ben-David, Shay Cohen, Ryan McDonald, Barbara Plank, Roi Reichart, Guy Rotman, Yftah Ziser
Venue:
AdaptNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
80–93
Language:
URL:
https://aclanthology.org/2021.adaptnlp-1.9
DOI:
Bibkey:
Cite (ACL):
Dario Stojanovski and Alexander Fraser. 2021. Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation. In Proceedings of the Second Workshop on Domain Adaptation for NLP, pages 80–93, Kyiv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation (Stojanovski & Fraser, AdaptNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.adaptnlp-1.9.pdf