Dr. Vishal Goyal
Also published as: Dr Vishal Goyal
2020
Language Identification and Normalization of Code Mixed English and Punjabi Text
Neetika Bansal
|
Dr. Vishal Goyal
|
Dr. Simpel Rani
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Code mixing is prevalent when users use two or more languages while communicating. It becomes more complex when users prefer romanized text to Unicode typing. The automatic processing of social media data has become one of popular areas of interest. Especially since COVID period the involvement of youngsters has attained heights. Walking with the pace our intended software deals with Language Identification and Normalization of English and Punjabi code mixed text. The software designed follows a pipeline which includes data collection, pre-processing, language identification, handling Out of Vocabulary words, normalization and transliteration of English- Punjabi text. After applying five-fold cross validation on the corpus, the accuracy of 96.8% is achieved on a trained dataset of around 80025 tokens. After the prediction of the tags: the slangs, contractions in the user input are normalized to their standard form. In addition, the words with Punjabi as predicted tags are transliterated to Punjabi.
Design and Implementation of Anaphora Resolution in Punjabi Language
Kawaljit Kaur
|
Dr Vishal Goyal
|
Dr Kamlesh Dutta
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Natural Language Processing (NLP) is the most attention-grabbing field of artificial intelligence. It focuses on the interaction between humans and computers. Through NLP we can make thec omputers recognize, decode and deduce the meaning ofhuman dialect splendidly. But there are numerous difficulties that are experienced in NLP and, Anaphora is one such issue. Anaphora emerges often in composed writings and oral talk. Anaphora Resolution is the process of finding antecedent of corresponding referent and is required in different applications of NLP.Appreciable works have been accounted for anaphora in English and different languages, but no work has been done in Punjabi Language. Through this paper we are enumerating the introduction of Anaphora Resolution in Punjabi language. The accuracy achieved for the system is 47%.