Do Models of Mental Health Based on Social Media Data Generalize?

Keith Harrigian, Carlos Aguirre, Mark Dredze


Abstract
Proxy-based methods for annotating mental health status in social media have grown popular in computational research due to their ability to gather large training samples. However, an emerging body of literature has raised new concerns regarding the validity of these types of methods for use in clinical applications. To further understand the robustness of distantly supervised mental health models, we explore the generalization ability of machine learning classifiers trained to detect depression in individuals across multiple social media platforms. Our experiments not only reveal that substantial loss occurs when transferring between platforms, but also that there exist several unreliable confounding factors that may enable researchers to overestimate classification performance. Based on these results, we enumerate recommendations for future mental health dataset construction.
Anthology ID:
2020.findings-emnlp.337
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3774–3788
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.337
DOI:
10.18653/v1/2020.findings-emnlp.337
Bibkey:
Cite (ACL):
Keith Harrigian, Carlos Aguirre, and Mark Dredze. 2020. Do Models of Mental Health Based on Social Media Data Generalize?. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3774–3788, Online. Association for Computational Linguistics.
Cite (Informal):
Do Models of Mental Health Based on Social Media Data Generalize? (Harrigian et al., Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.337.pdf
Code
 kharrigian/emnlp-2020-mental-health-generalization
Data
SMHD