Vipul Mishra


2021

pdf bib
BennettNLP at SemEval-2021 Task 5: Toxic Spans Detection using Stacked Embedding Powered Toxic Entity Recognizer
Harsh Kataria | Ambuje Gupta | Vipul Mishra
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

With the rapid growth in technology, social media activity has seen a boom across all age groups. It is humanly impossible to check all the tweets, comments and status manually whether they follow proper community guidelines. A lot of toxicity is regularly posted on these social media platforms. This research aims to find toxic words in a sentence so that a healthy social community is built across the globe and the users receive censored content with specific warnings and facts. To solve this challenging problem, authors have combined concepts of Linked List for pre-processing and then used the idea of stacked embeddings like BERT Embeddings, Flair Embeddings and Word2Vec on the flairNLP framework to get the desired results. F1 metric was used to evaluate the model. The authors were able to produce a 0.74 F1 score on their test set.

2020

pdf bib
BennettNLP at SemEval-2020 Task 8: Multimodal sentiment classification Using Hybrid Hierarchical Classifier
Ambuje Gupta | Harsh Kataria | Souvik Mishra | Tapas Badal | Vipul Mishra
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Memotion analysis is a very crucial and important subject in today’s world that is dominated by social media. This paper presents the results and analysis of the SemEval-2020 Task-8: Memotion analysis by team Kraken that qualified as winners for the task. This involved performing multimodal sentiment analysis on memes commonly posted over social media. The task comprised of 3 subtasks, Task A was to find the overall sentiment of a meme and classify it into positive, negative or neutral, Task B was to classify it into the different types which were namely humour, sarcasm, offensive or motivation where a meme could have more than one category, Task C was to further quantify the classifications achieved in task B. An imbalanced data of 6992 rows was utilized for this which contained images (memes), text (extracted OCR) and their annotations in 17 classes provided by the task organisers. In this paper, the authors proposed a hybrid neural Naïve-Bayes Support Vector Machine and logistic regression to solve a multilevel 17 class classification problem. It achieved the best result in Task B i.e 0.70 F1 score. The authors were ranked third in Task B.