AdapNMT : Neural Machine Translation with Technical Domain Adaptation for Indic Languages

Hema Ala, Dipti Sharma


Abstract
Adapting new domain is highly challenging task for Neural Machine Translation (NMT). In this paper we show the capability of general domain machine translation when translating into Indic languages (English - Hindi , English - Telugu and Hindi - Telugu), and low resource domain adaptation of MT systems using existing general parallel data and small in domain parallel data for AI and Chemistry Domains. We carried out our experiments using Byte Pair Encoding(BPE) as it solves rare word problems. It has been observed that with addition of little amount of in-domain data to the general data improves the BLEU score significantly.
Anthology ID:
2020.icon-adapmt.2
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON): Adap-MT 2020 Shared Task
Month:
December
Year:
2020
Address:
Patna, India
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
6–10
Language:
URL:
https://aclanthology.org/2020.icon-adapmt.2
DOI:
Bibkey:
Cite (ACL):
Hema Ala and Dipti Sharma. 2020. AdapNMT : Neural Machine Translation with Technical Domain Adaptation for Indic Languages. In Proceedings of the 17th International Conference on Natural Language Processing (ICON): Adap-MT 2020 Shared Task, pages 6–10, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
AdapNMT : Neural Machine Translation with Technical Domain Adaptation for Indic Languages (Ala & Sharma, ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-adapmt.2.pdf