Daiki Matsui

2026

Probabilistic Bilingual Subword Segmentation with Latent Subword Alignment
Shoto Nishida | Daiki Matsui | Takashi Ninomiya | Isao Goto | Akihiro Tamura
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

This study proposes a method for learning subword correspondences in parallel sentence pairs using the EM algorithm. Conventional neural machine translation typically employs subword segmentation models trained. However, since existing methods do not consider parallel relationships, inconsistencies in word segmentation between source and target languages may hinder translation model training. Our approach leverages direct modeling of subword correspondences in parallel corpora, thereby improving segmentation consistency across languages. Experiments across multiple machine translation tasks confirm that our proposed method improves translation accuracy for many tasks.

Co-authors

Venues

EACL1

Fix author