Harsh Kataria


2023

pdf bib
NLP-Titan at SemEval-2023 Task 6: Identification of Rhetorical Roles Using Sequential Sentence Classification
Harsh Kataria | Ambuje Gupta
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The analysis of legal cases poses a considerable challenge for researchers, practitioners, and academicians due to the lengthy and intricate nature of these documents. Developing countries such as India are experiencing a significant increase in the number of pending legal cases, which are often unstructured and difficult to process using conventional methods. To address this issue, the authors have implemented a sequential sentence classification process, which categorizes legal documents into 13 segments, known as Rhetorical Roles. This approach enables the extraction of valuable insights from the various classes of the structured document. The performance of this approach was evaluated using the F1 score, which measures the model’s precision and recall. The authors’ approach achieved an F1 score of 0.83, which surpasses the baseline score of 0.79 established by the task organizers. The authors have combined sequential sentence classification and the SetFit method in a hierarchical manner by combining similar classes to achieve this score.

2021

pdf bib
BennettNLP at SemEval-2021 Task 5: Toxic Spans Detection using Stacked Embedding Powered Toxic Entity Recognizer
Harsh Kataria | Ambuje Gupta | Vipul Mishra
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

With the rapid growth in technology, social media activity has seen a boom across all age groups. It is humanly impossible to check all the tweets, comments and status manually whether they follow proper community guidelines. A lot of toxicity is regularly posted on these social media platforms. This research aims to find toxic words in a sentence so that a healthy social community is built across the globe and the users receive censored content with specific warnings and facts. To solve this challenging problem, authors have combined concepts of Linked List for pre-processing and then used the idea of stacked embeddings like BERT Embeddings, Flair Embeddings and Word2Vec on the flairNLP framework to get the desired results. F1 metric was used to evaluate the model. The authors were able to produce a 0.74 F1 score on their test set.

2020

pdf bib
BennettNLP at SemEval-2020 Task 8: Multimodal sentiment classification Using Hybrid Hierarchical Classifier
Ambuje Gupta | Harsh Kataria | Souvik Mishra | Tapas Badal | Vipul Mishra
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Memotion analysis is a very crucial and important subject in today’s world that is dominated by social media. This paper presents the results and analysis of the SemEval-2020 Task-8: Memotion analysis by team Kraken that qualified as winners for the task. This involved performing multimodal sentiment analysis on memes commonly posted over social media. The task comprised of 3 subtasks, Task A was to find the overall sentiment of a meme and classify it into positive, negative or neutral, Task B was to classify it into the different types which were namely humour, sarcasm, offensive or motivation where a meme could have more than one category, Task C was to further quantify the classifications achieved in task B. An imbalanced data of 6992 rows was utilized for this which contained images (memes), text (extracted OCR) and their annotations in 17 classes provided by the task organisers. In this paper, the authors proposed a hybrid neural Naïve-Bayes Support Vector Machine and logistic regression to solve a multilevel 17 class classification problem. It achieved the best result in Task B i.e 0.70 F1 score. The authors were ranked third in Task B.