First Attempt at Building Parallel Corpora for Machine Translation of Northeast India’s Very Low-Resource Languages

Tonja Atnafu Lambebo, Mersha Melkamu, Kalita Ananya, Kolesnikova Olga, Kalita Jugal


Abstract
This paper presents the creation of initial bilingual corpora for thirteen very low-resource languages of India, all from Northeast India. It also presents the results of initial translation efforts in these languages. It creates the first-ever parallel corpora for these languages and provides initial benchmark neural machine translation results for these languages. We intend to extend these corpora to include a large number of low-resource Indian languages and integrate the effort with our prior work with African and American-Indian languages to create corpora covering a large number of languages from across the world.
Anthology ID:
2023.icon-1.49
Volume:
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2023
Address:
Goa University, Goa, India
Editors:
D. Pawar Jyoti, Lalitha Devi Sobha
Venue:
ICON
SIG:
SIGLEX
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
534–539
Language:
URL:
https://aclanthology.org/2023.icon-1.49
DOI:
Bibkey:
Cite (ACL):
Tonja Atnafu Lambebo, Mersha Melkamu, Kalita Ananya, Kolesnikova Olga, and Kalita Jugal. 2023. First Attempt at Building Parallel Corpora for Machine Translation of Northeast India’s Very Low-Resource Languages. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 534–539, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):
First Attempt at Building Parallel Corpora for Machine Translation of Northeast India’s Very Low-Resource Languages (Atnafu Lambebo et al., ICON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.icon-1.49.pdf