Classifying Social Media Users before and after Depression Diagnosis via Their Language Usage: A Dataset and Study

Falwah Alhamed, Julia Ive, Lucia Specia


Abstract
Mental illness can significantly impact individuals’ quality of life. Analysing social media data to uncover potential mental health issues in individuals via their posts is a popular research direction. However, most studies focus on the classification of users suffering from depression versus healthy users, or on the detection of suicidal thoughts. In this paper, we instead aim to understand and model linguistic changes that occur when users transition from a healthy to an unhealthy state. Addressing this gap could lead to better approaches for earlier depression detection when signs are not as obvious as in cases of severe depression or suicidal ideation. In order to achieve this goal, we have collected the first dataset of textual posts by the same users before and after reportedly being diagnosed with depression. We then use this data to build multiple predictive models (based on SVM, Random Forests, BERT, RoBERTa, MentalBERT, GPT-3, GPT-3.5, Bard, and Alpaca) for the task of classifying user posts. Transformer-based models achieved the best performance, while large language models used off-the-shelf proved less effective as they produced random guesses (GPT and Bard) or hallucinations (Alpaca).
Anthology ID:
2024.lrec-main.289
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
3250–3260
Language:
URL:
https://aclanthology.org/2024.lrec-main.289
DOI:
Bibkey:
Cite (ACL):
Falwah Alhamed, Julia Ive, and Lucia Specia. 2024. Classifying Social Media Users before and after Depression Diagnosis via Their Language Usage: A Dataset and Study. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3250–3260, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Classifying Social Media Users before and after Depression Diagnosis via Their Language Usage: A Dataset and Study (Alhamed et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.289.pdf