Apurbalal Senapati
2025
An Attention-Based Neural Translation System for English to Bodo
Subhash Wary
|
Birhang Borgoyary
|
Akher Ahmed
|
Mohanji Sah
|
Apurbalal Senapati
Proceedings of the Tenth Conference on Machine Translation
Bodo is a resource scarce, the indigenous language belongs to the Tibeto-Burman family. It is mainly spoken in the north-east region of India. It has both linguistic and cultural significance in the region. Only a limited number of resources and tools are available in this language. This paper presents a study of neural machine translation for the English-Bodo language pair. The system is developed on a relatively small parallel corpus provided by the Low-Resource Indic Language Translation as a part of WMT-2025. The system is evaluated by the WMT-2025 organizers with the evaluation matrices like BLUE, METEOR, ROUGE-L, chrF and TER. The result is not promising but it will help for the further improvement. The result is not encouraging, but it provides a foundation for further improvement.
2022
Generating Monolingual Dataset for Low Resource Language Bodo from old books using Google Keep
Sanjib Narzary
|
Maharaj Brahma
|
Mwnthai Narzary
|
Gwmsrang Muchahary
|
Pranav Kumar Singh
|
Apurbalal Senapati
|
Sukumar Nandi
|
Bidisha Som
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Bodo is a scheduled Indian language spoken largely by the Bodo community of Assam and other northeastern Indian states. Due to a lack of resources, it is difficult for young languages to communicate more effectively with the rest of the world. This leads to a lack of research in low-resource languages. The creation of a dataset is a tedious and costly process, particularly for languages with no participatory research. This is more visible for languages that are young and have recently adopted standard writing scripts. In this paper, we present a methodology using Google Keep for OCR to generate a monolingual Bodo corpus from different books. In this work, a Bodo text corpus of 192,327 tokens and 32,268 unique tokens is generated using free, accessible, and daily-usable applications. Moreover, some essential characteristics of the Bodo language are discussed that are neglected by Natural Language Progressing (NLP) researchers.
2013
GuiTAR-based Pronominal Anaphora Resolution in Bengali
Apurbalal Senapati
|
Utpal Garain
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Search
Fix author
Co-authors
- Akher Ahmed 1
- Birhang Borgoyary 1
- Maharaj Brahma 1
- Utpal Garain 1
- Gwmsrang Muchahary 1
- show all...