Sayantan Mahinder


2020

pdf bib
Social Media Attributions in the Context of Water Crisis
Rupak Sarkar | Sayantan Mahinder | Hirak Sarkar | Ashiqur KhudaBukhsh
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Attribution of natural disasters/collective misfortune is a widely-studied political science problem. However, such studies typically rely on surveys, or expert opinions, or external signals such as voting outcomes. In this paper, we explore the viability of using unstructured, noisy social media data to complement traditional surveys through automatically extracting attribution factors. We present a novel prediction task of attribution tie detection of identifying the factors (e.g., poor city planning, exploding population etc.) held responsible for the crisis in a social media document. We focus on the 2019 Chennai water crisis that rapidly escalated into a discussion topic with global importance following alarming water-crisis statistics. On a challenging data set constructed from YouTube comments (72,098 comments posted by 43,859 users on 623 videos relevant to the crisis), we present a neural baseline to identify attribution ties that achieves a reasonable performance (accuracy: 87.34% on attribution detection and 81.37% on attribution resolution). We release the first annotated data set of 2,500 comments in this important domain.

pdf bib
The Non-native Speaker Aspect: Indian English in Social Media
Rupak Sarkar | Sayantan Mahinder | Ashiqur KhudaBukhsh
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

As the largest institutionalized second language variety of English, Indian English has received a sustained focus from linguists for decades. However, to the best of our knowledge, no prior study has contrasted web-expressions of Indian English in noisy social media with English generated by a social media user base that are predominantly native speakers. In this paper, we address this gap in the literature through conducting a comprehensive analysis considering multiple structural and semantic aspects. In addition, we propose a novel application of language models to perform automatic linguistic quality assessment.