Nishanth Sastry


2024

pdf bib
Revealing COVID-19’s Social Dynamics: Diachronic Semantic Analysis of Vaccine and Symptom Discourse on Twitter
Zeqiang Wang | Jiageng Wu | Yuqi Wang | Wei Wang | Jie Yang | Jon Johnson | Nishanth Sastry | Suparna De
Findings of the Association for Computational Linguistics: EMNLP 2024

Social media is recognized as an important source for deriving insights into public opinion dynamics and social impacts due to the vast textual data generated daily and the ‘unconstrained’ behavior of people interacting on these platforms. However, such analyses prove challenging due to the semantic shift phenomenon, where word meanings evolve over time. This paper proposes an unsupervised dynamic word embedding method to capture longitudinal semantic shifts in social media data without predefined anchor words. The method leverages word co-occurrence statistics and dynamic updating to adapt embeddings over time, addressing the challenges of data sparseness, imbalanced distributions, and synergistic semantic effects. Evaluated on a large COVID-19 Twitter dataset, the method reveals semantic evolution patterns of vaccine- and symptom-related entities across different pandemic stages, and their potential correlations with real-world statistics. Our key contributions include the dynamic embedding technique, empirical analysis of COVID-19 semantic shifts, and discussions on enhancing semantic shift modeling for computational social science research. This study enables capturing longitudinal semantic dynamics on social media to understand public discourse and collective phenomena.

2021

pdf bib
An Expert Annotated Dataset for the Detection of Online Misogyny
Ella Guest | Bertie Vidgen | Alexandros Mittos | Nishanth Sastry | Gareth Tyson | Helen Margetts
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Online misogyny is a pernicious social problem that risks making online platforms toxic and unwelcoming to women. We present a new hierarchical taxonomy for online misogyny, as well as an expert labelled dataset to enable automatic classification of misogynistic content. The dataset consists of 6567 labels for Reddit posts and comments. As previous research has found untrained crowdsourced annotators struggle with identifying misogyny, we hired and trained annotators and provided them with robust annotation guidelines. We report baseline classification performance on the binary classification task, achieving accuracy of 0.93 and F1 of 0.43. The codebook and datasets are made freely available for future researchers.