Ana-Maria Bucur


Natural language processing as a tool to identify the Reddit particularities of cancer survivors around the time of diagnosis and remission: A pilot study
Ioana R. Podină | Ana-Maria Bucur | Diana Todea | Liviu Fodor | Andreea Luca | Liviu P. Dinu | Rareș Boian
Proceedings of the Fifth Workshop on Widening Natural Language Processing

In the current study, we analyzed 15297 texts from 39 cancer survivors who posted or commented on Reddit in order to detect the language particularities of cancer survivors from online discourse. We performed a computational linguistic analysis (part-of-speech analysis, emoji detection, sentiment analysis) on submissions around the time of the cancer diagnosis and around the time of remission. We found several significant differences in the texts posted around the time of remission compared to those around the time of diagnosis. Though our results need to be backed up by a higher corpus of data, they do cue to the fact that cancer survivors, around the time of remission, focus more on others, are more active on social media, and do not see the glass as half empty as suggested by the valence of the emojis.

pdf bib
Sequence-to-Sequence Lexical Normalization with Multilingual Transformers
Ana-Maria Bucur | Adrian Cosma | Liviu P. Dinu
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Current benchmark tasks for natural language processing contain text that is qualitatively different from the text used in informal day to day digital communication. This discrepancy has led to severe performance degradation of state-of-the-art NLP models when fine-tuned on real-world data. One way to resolve this issue is through lexical normalization, which is the process of transforming non-standard text, usually from social media, into a more standardized form. In this work, we propose a sentence-level sequence-to-sequence model based on mBART, which frames the problem as a machine translation problem. As the noisy text is a pervasive problem across languages, not just English, we leverage the multi-lingual pre-training of mBART to fine-tune it to our data. While current approaches mainly operate at the word or subword level, we argue that this approach is straightforward from a technical standpoint and builds upon existing pre-trained transformer networks. Our results show that while word-level, intrinsic, performance evaluation is behind other methods, our model improves performance on extrinsic, downstream tasks through normalization compared to models operating on raw, unprocessed, social media text.

pdf bib
An Exploratory Analysis of the Relation between Offensive Language and Mental Health
Ana-Maria Bucur | Marcos Zampieri | Liviu P. Dinu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
A Psychologically Informed Part-of-Speech Analysis of Depression in Social Media
Ana-Maria Bucur | Ioana R. Podina | Liviu P. Dinu
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

In this work, we provide an extensive part-of-speech analysis of the discourse of social media users with depression. Research in psychology revealed that depressed users tend to be self-focused, more preoccupied with themselves and ruminate more about their lives and emotions. Our work aims to make use of large-scale datasets and computational methods for a quantitative exploration of discourse. We use the publicly available depression dataset from the Early Risk Prediction on the Internet Workshop (eRisk) 2018 and extract part-of-speech features and several indices based on them. Our results reveal statistically significant differences between the depressed and non-depressed individuals confirming findings from the existing psychology literature. Our work provides insights regarding the way in which depressed individuals are expressing themselves on social media platforms, allowing for better-informed computational models to help monitor and prevent mental illnesses.