Fine-Tuning MT systems for Robustness to Second-Language Speaker Variations

Md Mahfuz Ibn Alam, Antonios Anastasopoulos


Abstract
The performance of neural machine translation (NMT) systems only trained on a single language variant degrades when confronted with even slightly different language variations. With this work, we build upon previous work to explore how to mitigate this issue. We show that fine-tuning using naturally occurring noise along with pseudo-references (i.e. “corrected” non-native inputs translated using the baseline NMT system) is a promising solution towards systems robust to such type of input variations. We focus on four translation pairs, from English to Spanish, Italian, French, and Portuguese, with our system achieving improvements of up to 3.1 BLEU points compared to the baselines, establishing a new state-of-the-art on the JFLEG-ES dataset. All datasets and code are publicly available here: https://github.com/mahfuzibnalam/finetuning_for_robustness .
Anthology ID:
2020.wnut-1.20
Volume:
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Month:
November
Year:
2020
Address:
Online
Editors:
Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
149–158
Language:
URL:
https://aclanthology.org/2020.wnut-1.20
DOI:
10.18653/v1/2020.wnut-1.20
Bibkey:
Cite (ACL):
Md Mahfuz Ibn Alam and Antonios Anastasopoulos. 2020. Fine-Tuning MT systems for Robustness to Second-Language Speaker Variations. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 149–158, Online. Association for Computational Linguistics.
Cite (Informal):
Fine-Tuning MT systems for Robustness to Second-Language Speaker Variations (Alam & Anastasopoulos, WNUT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wnut-1.20.pdf
Code
 mahfuzibnalam/finetuning_for_robustness
Data
FCEJFLEGMTNTOPUS-MT