This paper describes team PVG’s AI Club’s approach to the Emotion Classification shared task held at WASSA 2022. This Track 2 sub-task focuses on building models which can predict a multi-class emotion label based on essays from news articles where a person, group or another entity is affected. Baseline transformer models have been demonstrating good results on sequence classification tasks, and we aim to improve this performance with the help of ensembling techniques, and by leveraging two variations of emotion-specific representations. We observe better results than our baseline models and achieve an accuracy of 0.619 and a macro F1 score of 0.520 on the emotion classification task.
Active research pertaining to the affective phenomenon of empathy and distress is invaluable for improving human-machine interaction. Predicting intensities of such complex emotions from textual data is difficult, as these constructs are deeply rooted in the psychological theory. Consequently, for better prediction, it becomes imperative to take into account ancillary factors such as the psychological test scores, demographic features, underlying latent primitive emotions, along with the text’s undertone and its psychological complexity. This paper proffers team PVG’s solution to the WASSA 2021 Shared Task on Predicting Empathy and Emotion in Reaction to News Stories. Leveraging the textual data, demographic features, psychological test score, and the intrinsic interdependencies of primitive emotions and empathy, we propose a multi-input, multi-task framework for the task of empathy score prediction. Here, the empathy score prediction is considered the primary task, while emotion and empathy classification are considered secondary auxiliary tasks. For the distress score prediction task, the system is further boosted by the addition of lexical features. Our submission ranked 1st based on the average correlation (0.545) as well as the distress correlation (0.574), and 2nd for the empathy Pearson correlation (0.517).
Since their inception, transformer-based language models have led to impressive performance gains across multiple natural language processing tasks. For Arabic, the current state-of-the-art results on most datasets are achieved by the AraBERT language model. Notwithstanding these recent advancements, sarcasm and sentiment detection persist to be challenging tasks in Arabic, given the language’s rich morphology, linguistic disparity and dialectal variations. This paper proffers team SPPU-AASM’s submission for the WANLP ArSarcasm shared-task 2021, which centers around the sarcasm and sentiment polarity detection of Arabic tweets. The study proposes a hybrid model, combining sentence representations from AraBERT with static word vectors trained on Arabic social media corpora. The proposed system achieves a F1-sarcastic score of 0.62 and a F-PN score of 0.715 for the sarcasm and sentiment detection tasks, respectively. Simulation results show that the proposed system outperforms multiple existing approaches for both the tasks, suggesting that the amalgamation of context-free and context-dependent text representations can help capture complementary facets of word meaning in Arabic. The system ranked second and tenth in the respective sub-tasks of sarcasm detection and sentiment identification.
With mental health as a problem domain in NLP, the bulk of contemporary literature revolves around building better mental illness prediction models. The research focusing on the identification of discussion clusters in online mental health communities has been relatively limited. Moreover, as the underlying methodologies used in these studies mainly conform to the traditional machine learning models and statistical methods, the scope for introducing contextualized word representations for topic and theme extraction from online mental health communities remains open. Thus, in this research, we propose topic-infused deep contextualized representations, a novel data representation technique that uses autoencoders to combine deep contextual embeddings with topical information, generating robust representations for text clustering. Investigating the Reddit discourse on Post-Traumatic Stress Disorder (PTSD) and Complex Post-Traumatic Stress Disorder (C-PTSD), we elicit the thematic clusters representing the latent topics and themes discussed in the r/ptsd and r/CPTSD subreddits. Furthermore, we also present a qualitative analysis and characterization of each cluster, unraveling the prevalent discourse themes.