The University of Sydney’s Machine Translation System for WMT19

Liang Ding, Dacheng Tao


Abstract
This paper describes the University of Sydney’s submission of the WMT 2019 shared news translation task. We participated in the Finnish->English direction and got the best BLEU(33.0) score among all the participants. Our system is based on the self-attentional Transformer networks, into which we integrated the most recent effective strategies from academic research (e.g., BPE, back translation, multi-features data selection, data augmentation, greedy model ensemble, reranking, ConMBR system combination, and postprocessing). Furthermore, we propose a novel augmentation method Cycle Translation and a data mixture strategy Big/Small parallel construction to entirely exploit the synthetic corpus. Extensive experiments show that adding the above techniques can make continuous improvements of the BLEU scores, and the best result outperforms the baseline (Transformer ensemble model trained with the original parallel corpus) by approximately 5.3 BLEU score, achieving the state-of-the-art performance.
Anthology ID:
W19-5314
Volume:
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Month:
August
Year:
2019
Address:
Florence, Italy
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
175–182
Language:
URL:
https://aclanthology.org/W19-5314
DOI:
10.18653/v1/W19-5314
Bibkey:
Cite (ACL):
Liang Ding and Dacheng Tao. 2019. The University of Sydney’s Machine Translation System for WMT19. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 175–182, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
The University of Sydney’s Machine Translation System for WMT19 (Ding & Tao, WMT 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5314.pdf
Data
WMT 2016WMT 2016 NewsWMT 2018