NTT’s Machine Translation Systems for WMT19 Robustness Task

Soichiro Murakami, Makoto Morishita, Tsutomu Hirao, Masaaki Nagata


Abstract
This paper describes NTT’s submission to the WMT19 robustness task. This task mainly focuses on translating noisy text (e.g., posts on Twitter), which presents different difficulties from typical translation tasks such as news. Our submission combined techniques including utilization of a synthetic corpus, domain adaptation, and a placeholder mechanism, which significantly improved over the previous baseline. Experimental results revealed the placeholder mechanism, which temporarily replaces the non-standard tokens including emojis and emoticons with special placeholder tokens during translation, improves translation accuracy even with noisy texts.
Anthology ID:
W19-5365
Volume:
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
544–551
Language:
URL:
https://aclanthology.org/W19-5365
DOI:
10.18653/v1/W19-5365
Bibkey:
Cite (ACL):
Soichiro Murakami, Makoto Morishita, Tsutomu Hirao, and Masaaki Nagata. 2019. NTT’s Machine Translation Systems for WMT19 Robustness Task. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 544–551, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
NTT’s Machine Translation Systems for WMT19 Robustness Task (Murakami et al., WMT 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5365.pdf
Data
JESCMTNT