For numerous years, researchers have employed social media data to gain insights into users’ mental health. Nevertheless, the majority of investigations concentrate on categorizing users into those experiencing depression and those considered healthy, or on detection of suicidal thoughts. In this paper, we aim to extract evidence of a pre-assigned gold label. We used a suicidality dataset containing Reddit posts labeled with the suicide risk level. The task is to use Large Language Models (LLMs) to extract evidence from the post that justifies the given label. We used Meta Llama 7b and lexicons for solving the task and we achieved a precision of 0.96.
Social Anxiety Disorder (SAD) is a common condition, affecting a significant portion of the population. While research suggests spending time in nature can alleviate anxiety, the specific impact on SAD remains unclear. This study explores the relationship between discussions of outdoor spaces and social anxiety on social media. We leverage transformer-based and large language models (LLMs) to analyze a social media dataset focused on SAD. We developed three methods for the task of predicting the effects of outdoor spaces on SAD in social media. A two-stage pipeline classifier achieved the best performance of our submissions with results exceeding baseline performance.
Depression is a highly prevalent condition recognized by the World Health Organization as a leading contributor to global disability. Many people suffering from depression express their thoughts and feelings using social media, which thus becomes a source of data for research in this domain. However, existing annotation schemes tailored to studying depression symptoms in social media data remain limited. Reliable and valid annotation guidelines are crucial for accurately measuring mental health conditions for those studies. This paper addresses this gap by presenting a novel depression annotation scheme and guidelines for detecting depression symptoms and their severity in social media text. Our approach leverages validated depression questionnaires and incorporates the expertise of psychologists and psychiatrists during scheme refinement. The resulting annotation scheme achieves high inter-rater agreement, demonstrating its potential for suitable depression assessment in social media contexts.
Mental illness can significantly impact individuals’ quality of life. Analysing social media data to uncover potential mental health issues in individuals via their posts is a popular research direction. However, most studies focus on the classification of users suffering from depression versus healthy users, or on the detection of suicidal thoughts. In this paper, we instead aim to understand and model linguistic changes that occur when users transition from a healthy to an unhealthy state. Addressing this gap could lead to better approaches for earlier depression detection when signs are not as obvious as in cases of severe depression or suicidal ideation. In order to achieve this goal, we have collected the first dataset of textual posts by the same users before and after reportedly being diagnosed with depression. We then use this data to build multiple predictive models (based on SVM, Random Forests, BERT, RoBERTa, MentalBERT, GPT-3, GPT-3.5, Bard, and Alpaca) for the task of classifying user posts. Transformer-based models achieved the best performance, while large language models used off-the-shelf proved less effective as they produced random guesses (GPT and Bard) or hallucinations (Alpaca).
Social media data have been used in research for many years to understand users’ mental health. In this paper, using user-generated content we aim to achieve two goals: the first is detecting moments of mood change over time using timelines of users from Reddit. The second is predicting the degree of suicide risk as a user-level classification task. We used different approaches to address longitudinal modelling as well as the problem of the severely imbalanced dataset. Using BERT with undersampling techniques performed the best among other LSTM and basic random forests models for the first task. For the second task, extracting some features related to suicide from posts’ text contributed to the overall performance improvement. Specifically, a number of suicide-related words in a post as a feature improved the accuracy by 17%.