MMT’s Submission for the WMT 2023 Quality Estimation Shared Task

Yulong Wu, Viktor Schlegel, Daniel Beck, Riza Batista-Navarro


Abstract
This paper presents our submission to the WMT 2023 Quality Estimation (QE) shared task 1 (sentence-level subtask). We propose a straightforward training data augmentation approach aimed at improving the correlation between QE model predictions and human quality assessments. Utilising eleven data augmentation approaches and six distinct language pairs, we systematically create augmented training sets by individually applying each method to the original training set of each respective language pair. By evaluating the performance gap between the model before and after training on the augmented dataset, as measured on the development set, we assess the effectiveness of each augmentation method. Experimental results reveal that synonym replacement via the Paraphrase Database (PPDB) yields the most substantial performance boost for language pairs English-German, English-Marathi and English-Gujarati, while for the remaining language pairs, methods such as contextual word embeddings-based words insertion, back translation, and direct paraphrasing prove to be more effective. Training the model on a more diverse and larger set of samples does confer further performance improvements for certain language pairs, albeit to a marginal extent, and this phenomenon is not universally applicable. At the time of submission, we select the model trained on the augmented dataset constructed using the respective most effective method to generate predictions for the test set in each language pair, except for the English-German. Despite not being highly competitive, our system consistently surpasses the baseline performance on most language pairs and secures a third-place ranking in the English-Marathi.
Anthology ID:
2023.wmt-1.75
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
856–862
Language:
URL:
https://aclanthology.org/2023.wmt-1.75
DOI:
10.18653/v1/2023.wmt-1.75
Bibkey:
Cite (ACL):
Yulong Wu, Viktor Schlegel, Daniel Beck, and Riza Batista-Navarro. 2023. MMT’s Submission for the WMT 2023 Quality Estimation Shared Task. In Proceedings of the Eighth Conference on Machine Translation, pages 856–862, Singapore. Association for Computational Linguistics.
Cite (Informal):
MMT’s Submission for the WMT 2023 Quality Estimation Shared Task (Wu et al., WMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.wmt-1.75.pdf