Bering Lab’s Submissions on WAT 2021 Shared Task

Heesoo Park, Dongjun Lee


Abstract
This paper presents the Bering Lab’s submission to the shared tasks of the 8th Workshop on Asian Translation (WAT 2021) on JPC2 and NICT-SAP. We participated in all tasks on JPC2 and IT domain tasks on NICT-SAP. Our approach for all tasks mainly focused on building NMT systems in domain-specific corpora. We crawled patent document pairs for English-Japanese, Chinese-Japanese, and Korean-Japanese. After cleaning noisy data, we built parallel corpus by aligning those sentences with the sentence-level similarity scores. Also, for SAP test data, we collected the OPUS dataset including three IT domain corpora. We then trained transformer on the collected dataset. Our submission ranked 1st in eight out of fourteen tasks, achieving up to an improvement of 2.87 for JPC2 and 8.79 for NICT-SAP in BLEU score .
Anthology ID:
2021.wat-1.15
Volume:
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
Month:
August
Year:
2021
Address:
Online
Editors:
Toshiaki Nakazawa, Hideki Nakayama, Isao Goto, Hideya Mino, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Shohei Higashiyama, Hiroshi Manabe, Win Pa Pa, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Katsuhito Sudoh, Sadao Kurohashi, Pushpak Bhattacharyya
Venue:
WAT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
141–145
Language:
URL:
https://aclanthology.org/2021.wat-1.15
DOI:
10.18653/v1/2021.wat-1.15
Bibkey:
Cite (ACL):
Heesoo Park and Dongjun Lee. 2021. Bering Lab’s Submissions on WAT 2021 Shared Task. In Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 141–145, Online. Association for Computational Linguistics.
Cite (Informal):
Bering Lab’s Submissions on WAT 2021 Shared Task (Park & Lee, WAT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wat-1.15.pdf