Robust Neural Machine Translation with ASR Errors

Haiyang Xue, Yang Feng, Shuhao Gu, Wei Chen


Abstract
In many practical applications, neural machine translation systems have to deal with the input from automatic speech recognition (ASR) systems which may contain a certain number of errors. This leads to two problems which degrade translation performance. One is the discrepancy between the training and testing data and the other is the translation error caused by the input errors may ruin the whole translation. In this paper, we propose a method to handle the two problems so as to generate robust translation to ASR errors. First, we simulate ASR errors in the training data so that the data distribution in the training and test is consistent. Second, we focus on ASR errors on homophone words and words with similar pronunciation and make use of their pronunciation information to help the translation model to recover from the input errors. Experiments on two Chinese-English data sets show that our method is more robust to input errors and can outperform the strong Transformer baseline significantly.
Anthology ID:
2020.autosimtrans-1.3
Volume:
Proceedings of the First Workshop on Automatic Simultaneous Translation
Month:
July
Year:
2020
Address:
Seattle, Washington
Editors:
Hua Wu, Colin Cherry, Liang Huang, Zhongjun He, Mark Liberman, James Cross, Yang Liu
Venue:
AutoSimTrans
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15–23
Language:
URL:
https://aclanthology.org/2020.autosimtrans-1.3
DOI:
10.18653/v1/2020.autosimtrans-1.3
Bibkey:
Cite (ACL):
Haiyang Xue, Yang Feng, Shuhao Gu, and Wei Chen. 2020. Robust Neural Machine Translation with ASR Errors. In Proceedings of the First Workshop on Automatic Simultaneous Translation, pages 15–23, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
Robust Neural Machine Translation with ASR Errors (Xue et al., AutoSimTrans 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.autosimtrans-1.3.pdf
Video:
 http://slideslive.com/38929919