CLPsych2019 Shared Task: Predicting Suicide Risk Level from Reddit Posts on Multiple Forums

Victor Ruiz; Lingyun Shi; Wei Quan; Neal Ryan; Candice Biernesser; David Brent; Rich Tsui

doi:10.18653/v1/W19-3020

CLPsych2019 Shared Task: Predicting Suicide Risk Level from Reddit Posts on Multiple Forums

Victor Ruiz, Lingyun Shi, Wei Quan, Neal Ryan, Candice Biernesser, David Brent, Rich Tsui

Abstract

We aimed to predict an individual suicide risk level from longitudinal posts on Reddit discussion forums. Through participating in a shared task competition hosted by CLPsych2019, we received two annotated datasets: a training dataset with 496 users (31,553 posts) and a test dataset with 125 users (9610 posts). We submitted results from our three best-performing machine-learning models: SVM, Naïve Bayes, and an ensemble model. Each model provided a user’s suicide risk level in four categories, i.e., no risk, low risk, moderate risk, and severe risk. Among the three models, the ensemble model had the best macro-averaged F1 score 0.379 when tested on the holdout test dataset. The NB model had the best performance in two additional binary-classification tasks, i.e., no risk vs. flagged risk (any risk level other than no risk) with F1 score 0.836 and no or low risk vs. urgent risk (moderate or severe risk) with F1 score 0.736. We conclude that the NB model may serve as a tool for identifying users with flagged or urgent suicide risk based on longitudinal posts on Reddit discussion forums.

Anthology ID:: W19-3020
Volume:: Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Kate Niederhoffer, Kristy Hollingshead, Philip Resnik, Rebecca Resnik, Kate Loveys
Venue:: CLPsych
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 162–166
Language:
URL:: https://aclanthology.org/W19-3020/
DOI:: 10.18653/v1/W19-3020
Bibkey:
Cite (ACL):: Victor Ruiz, Lingyun Shi, Wei Quan, Neal Ryan, Candice Biernesser, David Brent, and Rich Tsui. 2019. CLPsych2019 Shared Task: Predicting Suicide Risk Level from Reddit Posts on Multiple Forums. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology, pages 162–166, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: CLPsych2019 Shared Task: Predicting Suicide Risk Level from Reddit Posts on Multiple Forums (Ruiz et al., CLPsych 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-3020.pdf

PDF Cite Search Fix data