CCIM: Cross-modal Cross-lingual Interactive Image Translation

Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, Chengqing Zong


Abstract
Text image machine translation (TIMT) which translates source language text images into target language texts has attracted intensive attention in recent years. Although the end-to-end TIMT model directly generates target translation from encoded text image features with an efficient architecture, it lacks the recognized source language information resulting in a decrease in translation performance. In this paper, we propose a novel Cross-modal Cross-lingual Interactive Model (CCIM) to incorporate source language information by synchronously generating source language and target language results through an interactive attention mechanism between two language decoders. Extensive experimental results have shown the interactive decoder significantly outperforms end-to-end TIMT models and has faster decoding speed with smaller model size than cascade models.
Anthology ID:
2023.findings-emnlp.330
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4959–4965
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.330
DOI:
10.18653/v1/2023.findings-emnlp.330
Bibkey:
Cite (ACL):
Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, and Chengqing Zong. 2023. CCIM: Cross-modal Cross-lingual Interactive Image Translation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4959–4965, Singapore. Association for Computational Linguistics.
Cite (Informal):
CCIM: Cross-modal Cross-lingual Interactive Image Translation (Ma et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.330.pdf