Exploring Adequacy Errors in Neural Machine Translation with the Help of Cross-Language Aligned Word Embeddings

Michael Ustaszewski

doi:10.26615/issn.2683-0078.2019_015

Exploring Adequacy Errors in Neural Machine Translation with the Help of Cross-Language Aligned Word Embeddings

Abstract

Neural machine translation (NMT) was shown to produce more fluent output than phrase-based statistical (PBMT) and rule-based machine translation (RBMT). However, improved fluency makes it more difficult for post editors to identify and correct adequacy errors, because unlike RBMT and SMT, in NMT adequacy errors are frequently not anticipated by fluency errors. Omissions and additions of content in otherwise flawlessly fluent NMT output are the most prominent types of such adequacy errors, which can only be detected with reference to source texts. This contribution explores the degree of semantic similarity between source texts, NMT output and post edited output. In this way, computational semantic similarity scores (cosine similarity) are related to human quality judgments. The analyses are based on publicly available NMT post editing data annotated for errors in three language pairs (EN-DE, EN-LV, EN-HR) with the Multidimensional Quality Metrics (MQM). Methodologically, this contribution tests whether cross-language aligned word embeddings as the sole source of semantic information mirror human error annotation.

Anthology ID:: W19-8715
Volume:: Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)
Month:: September
Year:: 2019
Address:: Varna, Bulgaria
Venue:: RANLP
SIG:
Publisher:: Incoma Ltd., Shoumen, Bulgaria
Note:
Pages:: 122–128
Language:
URL:: https://aclanthology.org/W19-8715/
DOI:: 10.26615/issn.2683-0078.2019_015
Bibkey:
Cite (ACL):: Michael Ustaszewski. 2019. Exploring Adequacy Errors in Neural Machine Translation with the Help of Cross-Language Aligned Word Embeddings. In Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019), pages 122–128, Varna, Bulgaria. Incoma Ltd., Shoumen, Bulgaria.
Cite (Informal):: Exploring Adequacy Errors in Neural Machine Translation with the Help of Cross-Language Aligned Word Embeddings (Ustaszewski, RANLP 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-8715.pdf

PDF Cite Search Fix data