Takeshi Hayakawa


2020

pdf bib
Fine-Grained Error Analysis on English-to-Japanese Machine Translation in the Medical Domain
Takeshi Hayakawa | Yuki Arase
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

We performed a detailed error analysis in domain-specific neural machine translation (NMT) for the English and Japanese language pair with fine-grained manual annotation. Despite its importance for advancing NMT technologies, research on the performance of domain-specific NMT and non-European languages has been limited. In this study, we designed an error typology based on the error types that were typically generated by NMT systems and might cause significant impact in technical translations: “Addition,” “Omission,” “Mistranslation,” “Grammar,” and “Terminology.” The error annotation was targeted to the medical domain and was performed by experienced professional translators specialized in medicine under careful quality control. The annotation detected 4,912 errors on 2,480 sentences, and the frequency and distribution of errors were analyzed. We found that the major errors in NMT were “Mistranslation” and “Terminology” rather than “Addition” and “Omission,” which have been reported as typical problems of NMT. Interestingly, more errors occurred in documents for professionals compared with those for the general public. The results of our annotation work will be published as a parallel corpus with error labels, which are expected to contribute to developing better NMT models, automatic evaluation metrics, and quality estimation models.
Search
Co-authors
Venues