Analyzing Challenges in Neural Machine Translation for Software Localization

Sai Koneru, Matthias Huck, Miriam Exel, Jan Niehues


Abstract
Advancements in Neural Machine Translation (NMT) greatly benefit the software localization industry by decreasing the post-editing time of human annotators. Although the volume of the software being localized is growing significantly, techniques for improving NMT for user interface (UI) texts are lacking. These UI texts have different properties than other collections of texts, presenting unique challenges for NMT. For example, they are often very short, causing them to be ambiguous and needing additional context (button, title text, a table item, etc.) for disambiguation. However, no such UI data sets are readily available with contextual information for NMT models to exploit. This work aims to provide a first step in improving UI translations and highlight its challenges. To achieve this, we provide a novel multilingual UI corpus collection (∼ 1.3M for English German) with a targeted test set and analyze the limitations of state-of-the-art methods on this challenging task. Specifically, we present a targeted test set for disambiguation from English to German to evaluate reliably and emphasize UI translation challenges. Furthermore, we evaluate several state-of-the-art NMT techniques from domain adaptation and document-level NMT on this challenging task. All the scripts to replicate the experiments and data sets are available here.ˆ,
Anthology ID:
2023.eacl-main.179
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2442–2454
Language:
URL:
https://aclanthology.org/2023.eacl-main.179
DOI:
10.18653/v1/2023.eacl-main.179
Bibkey:
Cite (ACL):
Sai Koneru, Matthias Huck, Miriam Exel, and Jan Niehues. 2023. Analyzing Challenges in Neural Machine Translation for Software Localization. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2442–2454, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Analyzing Challenges in Neural Machine Translation for Software Localization (Koneru et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.179.pdf
Video:
 https://aclanthology.org/2023.eacl-main.179.mp4