In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model

Yanzhi Tian, Xiang Li, Zeming Liu, Yuhang Guo, Bin Wang


Abstract
In-Image Machine Translation (IIMT) aims to convert images containing texts from one language to another. Traditional approaches for this task are cascade methods, which utilize optical character recognition (OCR) followed by neural machine translation (NMT) and text rendering. However, the cascade methods suffer from compounding errors of OCR and NMT, leading to a decrease in translation quality. In this paper, we propose an end-to-end model instead of the OCR, NMT and text rendering pipeline. Our neural architecture adopts encoder-decoder paradigm with segmented pixel sequences as inputs and outputs. Through end-to-end training, our model yields improvements across various dimensions, (i) it achieves higher translation quality by avoiding error propagation, (ii) it demonstrates robustness for out domain data, and (iii) it displays insensitivity to incomplete words. To validate the effectiveness of our method and support for future research, we construct our dataset containing 4M pairs of De-En images and train our end-to-end model. The experimental results show that our approach outperforms both cascade method and current end-to-end model.
Anthology ID:
2023.findings-emnlp.1004
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15046–15057
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.1004
DOI:
10.18653/v1/2023.findings-emnlp.1004
Bibkey:
Cite (ACL):
Yanzhi Tian, Xiang Li, Zeming Liu, Yuhang Guo, and Bin Wang. 2023. In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15046–15057, Singapore. Association for Computational Linguistics.
Cite (Informal):
In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model (Tian et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.1004.pdf