Two-Phase Cross-Lingual Language Model Fine-Tuning for Machine Translation Quality Estimation

Dongjun Lee


Abstract
In this paper, we describe the Bering Lab’s submission to the WMT 2020 Shared Task on Quality Estimation (QE). For word-level and sentence-level translation quality estimation, we fine-tune XLM-RoBERTa, the state-of-the-art cross-lingual language model, with a few additional parameters. Model training consists of two phases. We first pre-train our model on a huge artificially generated QE dataset, and then we fine-tune the model with a human-labeled dataset. When evaluated on the WMT 2020 English-German QE test set, our systems achieve the best result on the target-side of word-level QE and the second best results on the source-side of word-level QE and sentence-level QE among all submissions.
Anthology ID:
2020.wmt-1.118
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Editors:
Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1024–1028
Language:
URL:
https://aclanthology.org/2020.wmt-1.118
DOI:
Bibkey:
Cite (ACL):
Dongjun Lee. 2020. Two-Phase Cross-Lingual Language Model Fine-Tuning for Machine Translation Quality Estimation. In Proceedings of the Fifth Conference on Machine Translation, pages 1024–1028, Online. Association for Computational Linguistics.
Cite (Informal):
Two-Phase Cross-Lingual Language Model Fine-Tuning for Machine Translation Quality Estimation (Lee, WMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wmt-1.118.pdf
Video:
 https://slideslive.com/38939546