Kota Manabe

2026

Domain Adaptation of Image Encoder for Multimodal Manga Translation
Kota Manabe | Tomoyuki Kajiwara | Takashi Ninomiya | Isao Goto | Shonosuke Ishiwatari | Hiroshi Noji
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

The objective of this paper is to enhance machine translation for manga (Japanese comics) by developing and employing an image encoder that is capable of more accurately comprehending its visual context. Conventional manga machine translation systems have faced the challenge of lacking sufficient manga comprehension capabilities when utilizing image information. To address this issue, we propose a domain-adapted image encoder training method for manga. The proposed method involves training encoders to acquire visual features that consider the structural and sequential characteristics of the manga. This approach draws upon a technique that has proven to be highly effective in training language models. The image encoders trained by the proposed methods are used as visual processors in a multimodal machine translation model, and they are evaluated in a Japanese-English translation task. The experimental results demonstrate that the proposed method enhances the performance metrics for translation evaluation, such as BLEU and xCOMET, in comparison to the conventional method.

Co-authors

Venues

EACL1

Fix author