EnKhCorp1.0: An English–Khasi Corpus

Sahinur Rahman Laskar, Abdullah Faiz Ur Rahman Khilji Darsh Kaushik, Partha Pakray, Sivaji Bandyopadhyay


Abstract
In machine translation, corpus preparation is one of the crucial tasks, particularly for lowresource pairs. In multilingual countries like India, machine translation plays a vital role in communication among people with various linguistic backgrounds. There are available online automatic translation systems by Google and Microsoft which include various languages which lack support for the Khasi language, which can hence be considered lowresource. This paper overviews the development of EnKhCorp1.0, a corpus for English–Khasi pair, and implemented baseline systems for EnglishtoKhasi and KhasitoEnglish translation based on the neural machine translation approach.
Anthology ID:
2021.mtsummit-loresmt.9
Volume:
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)
Month:
August
Year:
2021
Address:
Virtual
Editors:
John Ortega, Atul Kr. Ojha, Katharina Kann, Chao-Hong Liu
Venue:
LoResMT
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
89–95
Language:
URL:
https://aclanthology.org/2021.mtsummit-loresmt.9
DOI:
Bibkey:
Cite (ACL):
Sahinur Rahman Laskar, Abdullah Faiz Ur Rahman Khilji Darsh Kaushik, Partha Pakray, and Sivaji Bandyopadhyay. 2021. EnKhCorp1.0: An English–Khasi Corpus. In Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021), pages 89–95, Virtual. Association for Machine Translation in the Americas.
Cite (Informal):
EnKhCorp1.0: An English–Khasi Corpus (Laskar et al., LoResMT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.mtsummit-loresmt.9.pdf