Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis

Amey Hengle, Atharva Kulkarni, Shantanu Deepak Patankar, Madhumitha Chandrasekaran, Sneha D’silva, Jemima S. Jacob, Rashmi Gupta


Abstract
In this study, we introduce ANGST, a novel, first of its kind benchmark for depression-anxiety comorbidity classification from social media posts. Unlike contemporary datasets that often oversimplify the intricate interplay between different mental health disorders by treating them as isolated conditions, ANGST enables multi-label classification, allowing each post to be simultaneously identified as indicating depression and/or anxiety. Comprising 2876 meticulously annotated posts by expert psychologists and an additional 7667 silver-labeled posts, ANGST posits a more representative sample of online mental health discourse. Moreover, we benchmark ANGST using various state-of-the-art language models, ranging from Mental-BERT to GPT-4. Our results provide significant insights into the capabilities and limitations of these models in complex diagnostic scenarios. While GPT-4 generally outperforms other models, none achieve an F1 score exceeding 72% in multi-class comorbid classification, underscoring the ongoing challenges in applying language models to mental health diagnostics.
Anthology ID:
2024.emnlp-main.931
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16698–16721
Language:
URL:
https://aclanthology.org/2024.emnlp-main.931/
DOI:
10.18653/v1/2024.emnlp-main.931
Bibkey:
Cite (ACL):
Amey Hengle, Atharva Kulkarni, Shantanu Deepak Patankar, Madhumitha Chandrasekaran, Sneha D’silva, Jemima S. Jacob, and Rashmi Gupta. 2024. Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 16698–16721, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis (Hengle et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.931.pdf
Software:
 2024.emnlp-main.931.software.zip