SMHD-GER: A Large-Scale Benchmark Dataset for Automatic Mental Health Detection from Social Media in German

Sourabh Zanwar, Daniel Wiechmann, Yu Qiao, Elma Kerz


Abstract
Mental health problems are a challenge to our modern society, and their prevalence is predicted to increase worldwide. Recently, a surge of research has demonstrated the potential of automated detection of mental health conditions (MHC) through social media posts, with the ultimate goal of enabling early intervention and monitoring population-level health outcomes in real-time. Progress in this area of research is highly dependent on the availability of high-quality datasets and benchmark corpora. However, the publicly available datasets for understanding and modelling MHC are largely confined to the English language. In this paper, we introduce SMHD-GER (Self-Reported Mental Health Diagnoses for German), a large-scale, carefully constructed dataset for MHC detection built on high-precision patterns and the approach proposed for English. We provide benchmark models for this dataset to facilitate further research and conduct extensive experiments. These models leverage engineered (psycho-)linguistic features as well as BERT-German. We also examine nuanced patterns of linguistic markers characteristics of specific MHC.
Anthology ID:
2023.findings-eacl.113
Volume:
Findings of the Association for Computational Linguistics: EACL 2023
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1526–1541
Language:
URL:
https://aclanthology.org/2023.findings-eacl.113
DOI:
10.18653/v1/2023.findings-eacl.113
Bibkey:
Cite (ACL):
Sourabh Zanwar, Daniel Wiechmann, Yu Qiao, and Elma Kerz. 2023. SMHD-GER: A Large-Scale Benchmark Dataset for Automatic Mental Health Detection from Social Media in German. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1526–1541, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
SMHD-GER: A Large-Scale Benchmark Dataset for Automatic Mental Health Detection from Social Media in German (Zanwar et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-eacl.113.pdf
Video:
 https://aclanthology.org/2023.findings-eacl.113.mp4