Abstract
This paper presents the Bering Lab’s submission to the shared tasks of the 8th Workshop on Asian Translation (WAT 2021) on JPC2 and NICT-SAP. We participated in all tasks on JPC2 and IT domain tasks on NICT-SAP. Our approach for all tasks mainly focused on building NMT systems in domain-specific corpora. We crawled patent document pairs for English-Japanese, Chinese-Japanese, and Korean-Japanese. After cleaning noisy data, we built parallel corpus by aligning those sentences with the sentence-level similarity scores. Also, for SAP test data, we collected the OPUS dataset including three IT domain corpora. We then trained transformer on the collected dataset. Our submission ranked 1st in eight out of fourteen tasks, achieving up to an improvement of 2.87 for JPC2 and 8.79 for NICT-SAP in BLEU score .- Anthology ID:
- 2021.wat-1.15
- Volume:
- Proceedings of the 8th Workshop on Asian Translation (WAT2021)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Toshiaki Nakazawa, Hideki Nakayama, Isao Goto, Hideya Mino, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Shohei Higashiyama, Hiroshi Manabe, Win Pa Pa, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Katsuhito Sudoh, Sadao Kurohashi, Pushpak Bhattacharyya
- Venue:
- WAT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 141–145
- Language:
- URL:
- https://aclanthology.org/2021.wat-1.15
- DOI:
- 10.18653/v1/2021.wat-1.15
- Bibkey:
- Cite (ACL):
- Heesoo Park and Dongjun Lee. 2021. Bering Lab’s Submissions on WAT 2021 Shared Task. In Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 141–145, Online. Association for Computational Linguistics.
- Cite (Informal):
- Bering Lab’s Submissions on WAT 2021 Shared Task (Park & Lee, WAT 2021)
- Copy Citation:
- PDF:
- https://aclanthology.org/2021.wat-1.15.pdf
Export citation
@inproceedings{park-lee-2021-bering, title = "Bering Lab{'}s Submissions on {WAT} 2021 Shared Task", author = "Park, Heesoo and Lee, Dongjun", editor = "Nakazawa, Toshiaki and Nakayama, Hideki and Goto, Isao and Mino, Hideya and Ding, Chenchen and Dabre, Raj and Kunchukuttan, Anoop and Higashiyama, Shohei and Manabe, Hiroshi and Pa, Win Pa and Parida, Shantipriya and Bojar, Ond{\v{r}}ej and Chu, Chenhui and Eriguchi, Akiko and Abe, Kaori and Oda, Yusuke and Sudoh, Katsuhito and Kurohashi, Sadao and Bhattacharyya, Pushpak", booktitle = "Proceedings of the 8th Workshop on Asian Translation (WAT2021)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.wat-1.15", doi = "10.18653/v1/2021.wat-1.15", pages = "141--145", abstract = "This paper presents the Bering Lab{'}s submission to the shared tasks of the 8th Workshop on Asian Translation (WAT 2021) on JPC2 and NICT-SAP. We participated in all tasks on JPC2 and IT domain tasks on NICT-SAP. Our approach for all tasks mainly focused on building NMT systems in domain-specific corpora. We crawled patent document pairs for English-Japanese, Chinese-Japanese, and Korean-Japanese. After cleaning noisy data, we built parallel corpus by aligning those sentences with the sentence-level similarity scores. Also, for SAP test data, we collected the OPUS dataset including three IT domain corpora. We then trained transformer on the collected dataset. Our submission ranked 1st in eight out of fourteen tasks, achieving up to an improvement of 2.87 for JPC2 and 8.79 for NICT-SAP in BLEU score .", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="park-lee-2021-bering"> <titleInfo> <title>Bering Lab’s Submissions on WAT 2021 Shared Task</title> </titleInfo> <name type="personal"> <namePart type="given">Heesoo</namePart> <namePart type="family">Park</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Dongjun</namePart> <namePart type="family">Lee</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2021-08</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 8th Workshop on Asian Translation (WAT2021)</title> </titleInfo> <name type="personal"> <namePart type="given">Toshiaki</namePart> <namePart type="family">Nakazawa</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hideki</namePart> <namePart type="family">Nakayama</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Isao</namePart> <namePart type="family">Goto</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hideya</namePart> <namePart type="family">Mino</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chenchen</namePart> <namePart type="family">Ding</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Raj</namePart> <namePart type="family">Dabre</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Anoop</namePart> <namePart type="family">Kunchukuttan</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Shohei</namePart> <namePart type="family">Higashiyama</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hiroshi</namePart> <namePart type="family">Manabe</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Win</namePart> <namePart type="given">Pa</namePart> <namePart type="family">Pa</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Shantipriya</namePart> <namePart type="family">Parida</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ondřej</namePart> <namePart type="family">Bojar</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chenhui</namePart> <namePart type="family">Chu</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Akiko</namePart> <namePart type="family">Eriguchi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Kaori</namePart> <namePart type="family">Abe</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yusuke</namePart> <namePart type="family">Oda</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Katsuhito</namePart> <namePart type="family">Sudoh</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sadao</namePart> <namePart type="family">Kurohashi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Pushpak</namePart> <namePart type="family">Bhattacharyya</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Online</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>This paper presents the Bering Lab’s submission to the shared tasks of the 8th Workshop on Asian Translation (WAT 2021) on JPC2 and NICT-SAP. We participated in all tasks on JPC2 and IT domain tasks on NICT-SAP. Our approach for all tasks mainly focused on building NMT systems in domain-specific corpora. We crawled patent document pairs for English-Japanese, Chinese-Japanese, and Korean-Japanese. After cleaning noisy data, we built parallel corpus by aligning those sentences with the sentence-level similarity scores. Also, for SAP test data, we collected the OPUS dataset including three IT domain corpora. We then trained transformer on the collected dataset. Our submission ranked 1st in eight out of fourteen tasks, achieving up to an improvement of 2.87 for JPC2 and 8.79 for NICT-SAP in BLEU score .</abstract> <identifier type="citekey">park-lee-2021-bering</identifier> <identifier type="doi">10.18653/v1/2021.wat-1.15</identifier> <location> <url>https://aclanthology.org/2021.wat-1.15</url> </location> <part> <date>2021-08</date> <extent unit="page"> <start>141</start> <end>145</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T Bering Lab’s Submissions on WAT 2021 Shared Task %A Park, Heesoo %A Lee, Dongjun %Y Nakazawa, Toshiaki %Y Nakayama, Hideki %Y Goto, Isao %Y Mino, Hideya %Y Ding, Chenchen %Y Dabre, Raj %Y Kunchukuttan, Anoop %Y Higashiyama, Shohei %Y Manabe, Hiroshi %Y Pa, Win Pa %Y Parida, Shantipriya %Y Bojar, Ondřej %Y Chu, Chenhui %Y Eriguchi, Akiko %Y Abe, Kaori %Y Oda, Yusuke %Y Sudoh, Katsuhito %Y Kurohashi, Sadao %Y Bhattacharyya, Pushpak %S Proceedings of the 8th Workshop on Asian Translation (WAT2021) %D 2021 %8 August %I Association for Computational Linguistics %C Online %F park-lee-2021-bering %X This paper presents the Bering Lab’s submission to the shared tasks of the 8th Workshop on Asian Translation (WAT 2021) on JPC2 and NICT-SAP. We participated in all tasks on JPC2 and IT domain tasks on NICT-SAP. Our approach for all tasks mainly focused on building NMT systems in domain-specific corpora. We crawled patent document pairs for English-Japanese, Chinese-Japanese, and Korean-Japanese. After cleaning noisy data, we built parallel corpus by aligning those sentences with the sentence-level similarity scores. Also, for SAP test data, we collected the OPUS dataset including three IT domain corpora. We then trained transformer on the collected dataset. Our submission ranked 1st in eight out of fourteen tasks, achieving up to an improvement of 2.87 for JPC2 and 8.79 for NICT-SAP in BLEU score . %R 10.18653/v1/2021.wat-1.15 %U https://aclanthology.org/2021.wat-1.15 %U https://doi.org/10.18653/v1/2021.wat-1.15 %P 141-145
Markdown (Informal)
[Bering Lab’s Submissions on WAT 2021 Shared Task](https://aclanthology.org/2021.wat-1.15) (Park & Lee, WAT 2021)
- Bering Lab’s Submissions on WAT 2021 Shared Task (Park & Lee, WAT 2021)
ACL
- Heesoo Park and Dongjun Lee. 2021. Bering Lab’s Submissions on WAT 2021 Shared Task. In Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 141–145, Online. Association for Computational Linguistics.