Monalisa Dey


2021

pdf bib
Classification of COVID19 tweets using Machine Learning Approaches
Anupam Mondal | Sainik Mahata | Monalisa Dey | Dipankar Das
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

The reported work is a description of our participation in the “Classification of COVID19 tweets containing symptoms” shared task, organized by the “Social Media Mining for Health Applications (SMM4H)” workshop. The literature describes two machine learning approaches that were used to build a three class classification system, that categorizes tweets related to COVID19, into three classes, viz., self-reports, non-personal reports, and literature/news mentions. The steps for pre-processing tweets, feature extraction, and the development of the machine learning models, are described extensively in the documentation. Both the developed learning models, when evaluated by the organizers, garnered F1 scores of 0.93 and 0.92 respectively.

2018

pdf bib
Summarization of Table Citations from Text
Monalisa Dey | Salma Mandi | Dipankar Das
Proceedings of the 15th International Conference on Natural Language Processing

2017

pdf bib
JUNLP at IJCNLP-2017 Task 3: A Rank Prediction Model for Review Opinion Diversification
Monalisa Dey | Anupam Mondal | Dipankar Das
Proceedings of the IJCNLP 2017, Shared Tasks

IJCNLP-17 Review Opinion Diversification (RevOpiD-2017) task has been designed for ranking the top-k reviews of a product from a set of reviews, which assists in identifying a summarized output to express the opinion of the entire review set. The task is divided into three independent subtasks as subtask-A,subtask-B, and subtask-C. Each of these three subtasks selects the top-k reviews based on helpfulness, representativeness, and exhaustiveness of the opinions expressed in the review set individually. In order to develop the modules and predict the rank of reviews for all three subtasks, we have employed two well-known supervised classifiers namely, Naïve Bayes and Logistic Regression on the top of several extracted features such as the number of nouns, number of verbs, and number of sentiment words etc from the provided datasets. Finally, the organizers have helped to validate the predicted outputs for all three subtasks by using their evaluation metrics. The metrics provide the scores of list size 5 as (0.80 (mth)) for subtask-A, (0.86 (cos), 0.87 (cos d), 0.71 (cpr), 4.98 (a-dcg), and 556.94 (wt)) for subtask B, and (10.94 (unwt) and 0.67 (recall)) for subtask C individually.