Vinuthkumar Prasan


2019

pdf bib
Robust Text Classification using Sub-Word Information in Input Word Representations.
Bhanu Prakash Mahanti | Priyank Chhipa | Vivek Sridhar | Vinuthkumar Prasan
Proceedings of the 16th International Conference on Natural Language Processing

Word based deep learning approaches have been used with increasing success recently to solve Natural Language Processing problems like Machine Translation, Language Modelling and Text Classification. However, performance of these word based models is limited by the vocabulary of the training corpus. Alternate approaches using character based models have been proposed to overcome the unseen word problems arising for a variety of reasons. However, character based models fail to capture the sequential relationship of words inherently present in texts. Hence, there is scope for improvement by addressing the unseen word problem while also maintaining the sequential context through word based models. In this work, we propose a method where the input embedding vector incorporates sub-word information but is also suitable for use with models which successfully capture the sequential nature of text. We further attempt to establish that using such a word representation as input makes the model robust to unseen words, particularly arising due to tokenization and spelling errors, which is a common problem in systems where a typing interface is one of the input modalities.

pdf bib
Robust Deep Learning Based Sentiment Classification of Code-Mixed Text
Siddhartha Mukherjee | Vinuthkumar Prasan | Anish Nediyanchath | Manan Shah | Nikhil Kumar
Proceedings of the 16th International Conference on Natural Language Processing

India is one of unique countries in the world that has the legacy of diversity of languages. Most of these languages are influenced by English. This causes a large presence of code-mixed text in Social Media. Enormous presence of this code-mixed text provides an important research area for Natural Language Processing (NLP). This paper proposes a novel Attention based deep learning technique for Sentiment Classification on Code-Mixed Text (ACCMT) of Hindi-English. The proposed architecture uses fusion of character and word features. Non availability of suitable Word Embedding to represent these Code-Mixed texts is another important hurdle for this league of NLP tasks. This paper also proposes a novel technique for preparing Word Embedding of Code-Mixed text. This embedding is prepared with two separately trained word-embedding on Romanized Hindi and English respectively. This embedding is further used in the proposed deep learning based architecture for robust classification. The Proposed technique achieves 71.97% accuracy, which exceeds the baseline accuracy.