Prateek Agrawal


2020

pdf bib
A SANSKRIT TO HINDI LANGUAGE MACHINE TRANSLATOR USING RULE BASED APPROACH
Prateek Agrawal | Vishu Madaan
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations

Hindi and Sanskrit both the languages are having the same script i.e. Devnagari Script which results in few basic similarities in their grammar rules. As we know that Hindi ranks fourth in terms of speaker’s size in the world and over 60 million people in India are Hindi internet users. In India itself, there are approximately 120 languages and 240 mother tongues but hardly a few languages are recognized worldwide while the others are losing their existence in society day by day. Likewise, Sanskrit is one of those important languages that are being ignored in society. As per census report of India in 2001, less than 15000 citizens have returned Sanskrit as their Mother tongue or preferred medium of communication. A key reason behind poor acceptance of Sanskrit is due to language barrier among Indian masses and lack of knowledge about this language among people. Therefore, our attempt is just to connect a big crowd of Hindi users with Sanskrit language and make them familiar at least with the basics of Sanskrit. We developed a translation tool that parses Sanskrit words (prose) one by one and translate it into equivalent Hindi language in step by step manner: (i) We created a strong Hindi-Sanskrit corpus that can deal with Sanskrit words effectively and efficiently. (ii) We proposed an algorithm to stem Sanskrit word that chops off the starts/ends of words to find the root words in the form of nouns and verbs. (iii) After stemming, we developed an algorithm to search the equivalent Hindi meaning of stemmed words from the corpus-based on semantic analysis. (iv)We developed an algorithm to implement semantic analysis to translate words that help the tool to identify required parameter details like gender, number, case etc. (v) Next, we developed an algorithm for discourse integration to dis-join each translated sentence based on subject/noun dependency. (vi) Next, we implemented pragmatic analysis algorithm that ensures the meaningful validation of these translated Hindi sentences syntactically and semantically. (vii) We further extended our work to summarize the translated text story and suggest a suitable heading/title. For this, we referred ripple down rule-based parts of speech (RDR-POS) Tagger for word tagging in the POS tagger corpora. (viii) We proposed a title generation algorithm which suggests some suitable title of the translated text. (ix) Finally, we assembled all phases to one translation tool that takes a story of maximum one hundred words as input and translates it into equivalent Hindi language.