Bhojpuri WordNet: Problems in Translating Hindi Synsets into Bhojpuri

Imran Ali, Praveen Gatla


Abstract
Today, artificial intelligence systems are incredibly intelligent, however they lack the human like capacity for understanding. In this context, sense-based lexical resources become a requirement for artificially intelligent machines. Lexical resources like Wordnets have received scholarly attention because they are considered as the crucial sense-based resources in the field of natural language understanding. They can help in knowing the intended meaning of the communicated texts, as they are focused on the concept rather than the words. Wordnets are available only for 18 Indian languages. Keeping this in mind, we have initiated the development of a comprehensive wordnet for Bhojpuri. The present paper describes the creation of the synsets of Bhojpuri and discusses the problems that we faced while translating Hindi synsets into Bhojpuri. They are lexical anomalies, lexical mismatch words, synthesized forms, lack of technical words etc. Nearly 4000 Hindi synsets were mapped for their equivalent synsets in Bhojpuri following the expansion approach. We have also worked on the language-specific synsets, which are unique to Bhojpuri. This resource is useful in machine translation, sentiment analysis, word sense disambiguation, cross-lingual references among Indian languages, and Bhojpuri language teaching and learning.
Anthology ID:
2023.ranlp-1.7
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
60–68
Language:
URL:
https://aclanthology.org/2023.ranlp-1.7
DOI:
Bibkey:
Cite (ACL):
Imran Ali and Praveen Gatla. 2023. Bhojpuri WordNet: Problems in Translating Hindi Synsets into Bhojpuri. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 60–68, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Bhojpuri WordNet: Problems in Translating Hindi Synsets into Bhojpuri (Ali & Gatla, RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.7.pdf