NICT-2 Translation System at WAT-2021: Applying a Pretrained Multilingual Encoder-Decoder Model to Low-resource Language Pairs

Kenji Imamura, Eiichiro Sumita


Abstract
In this paper, we present the NICT system (NICT-2) submitted to the NICT-SAP shared task at the 8th Workshop on Asian Translation (WAT-2021). A feature of our system is that we used a pretrained multilingual BART (Bidirectional and Auto-Regressive Transformer; mBART) model. Because publicly available models do not support some languages in the NICT-SAP task, we added these languages to the mBART model and then trained it using monolingual corpora extracted from Wikipedia. We fine-tuned the expanded mBART model using the parallel corpora specified by the NICT-SAP task. The BLEU scores greatly improved in comparison with those of systems without the pretrained model, including the additional languages.
Anthology ID:
2021.wat-1.8
Volume:
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
Month:
August
Year:
2021
Address:
Online
Editors:
Toshiaki Nakazawa, Hideki Nakayama, Isao Goto, Hideya Mino, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Shohei Higashiyama, Hiroshi Manabe, Win Pa Pa, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Katsuhito Sudoh, Sadao Kurohashi, Pushpak Bhattacharyya
Venue:
WAT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
90–95
Language:
URL:
https://aclanthology.org/2021.wat-1.8
DOI:
10.18653/v1/2021.wat-1.8
Bibkey:
Cite (ACL):
Kenji Imamura and Eiichiro Sumita. 2021. NICT-2 Translation System at WAT-2021: Applying a Pretrained Multilingual Encoder-Decoder Model to Low-resource Language Pairs. In Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 90–95, Online. Association for Computational Linguistics.
Cite (Informal):
NICT-2 Translation System at WAT-2021: Applying a Pretrained Multilingual Encoder-Decoder Model to Low-resource Language Pairs (Imamura & Sumita, WAT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wat-1.8.pdf