Enhancing Neural Machine Translation for Ainu-Japanese: A Comprehensive Study on the Impact of Domain and Dialect Integration

Ryo Igarashi, So Miyagawa


Abstract
Neural Machine Translation (NMT) has revolutionized language translation, yet significant challenges persist for low-resource languages, particularly those with high dialectal variation and limited standardization. This comprehensive study focuses on the Ainu language, a critically endangered indigenous language of northern Japan, which epitomizes these challenges. We address the limitations of previous research through two primary strategies: (1) extensive corpus expansion encompassing diverse domains and dialects, and (2) development of innovative methods to incorporate dialect and domain information directly into the translation process. Our approach yielded substantial improvements in translation quality, with BLEU scores increasing from 32.90 to 39.06 (+6.16) for Japanese → Ainu and from 10.45 to 31.83 (+21.38) for Ainu → Japanese. Through rigorous experimentation and analysis, we demonstrate the crucial importance of integrating linguistic variation information in NMT systems for languages characterized by high diversity and limited resources. Our findings have broad implications for improving machine translation for other low-resource languages, potentially advancing preservation and revitalization efforts for endangered languages worldwide.
Anthology ID:
2024.nlp4dh-1.40
Volume:
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Month:
November
Year:
2024
Address:
Miami, USA
Editors:
Mika Hämäläinen, Emily Öhman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
Venue:
NLP4DH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
413–422
Language:
URL:
https://aclanthology.org/2024.nlp4dh-1.40
DOI:
Bibkey:
Cite (ACL):
Ryo Igarashi and So Miyagawa. 2024. Enhancing Neural Machine Translation for Ainu-Japanese: A Comprehensive Study on the Impact of Domain and Dialect Integration. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pages 413–422, Miami, USA. Association for Computational Linguistics.
Cite (Informal):
Enhancing Neural Machine Translation for Ainu-Japanese: A Comprehensive Study on the Impact of Domain and Dialect Integration (Igarashi & Miyagawa, NLP4DH 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4dh-1.40.pdf