Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness

Alexandre Berard, Ioan Calapodescu, Marc Dymetman, Claude Roux, Jean-Luc Meunier, Vassilina Nikoulina


Abstract
We share a French-English parallel corpus of Foursquare restaurant reviews, and define a new task to encourage research on Neural Machine Translation robustness and domain adaptation, in a real-world scenario where better-quality MT would be greatly beneficial. We discuss the challenges of such user-generated content, and train good baseline models that build upon the latest techniques for MT robustness. We also perform an extensive evaluation (automatic and human) that shows significant improvements over existing online systems. Finally, we propose task-specific metrics based on sentiment analysis or translation accuracy of domain-specific polysemous words.
Anthology ID:
D19-5617
Volume:
Proceedings of the 3rd Workshop on Neural Generation and Translation
Month:
November
Year:
2019
Address:
Hong Kong
Venues:
EMNLP | NGT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
168–176
Language:
URL:
https://aclanthology.org/D19-5617
DOI:
10.18653/v1/D19-5617
Bibkey:
Cite (ACL):
Alexandre Berard, Ioan Calapodescu, Marc Dymetman, Claude Roux, Jean-Luc Meunier, and Vassilina Nikoulina. 2019. Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 168–176, Hong Kong. Association for Computational Linguistics.
Cite (Informal):
Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness (Berard et al., EMNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-5617.pdf
Attachment:
 D19-5617.Attachment.zip
Data
OpenSubtitles