Identifying Depression on Reddit: The Effect of Training Data

Inna Pirina, Çağrı Çöltekin


Abstract
This paper presents a set of classification experiments for identifying depression in posts gathered from social media platforms. In addition to the data gathered previously by other researchers, we collect additional data from the social media platform Reddit. Our experiments show promising results for identifying depression from social media texts. More importantly, however, we show that the choice of corpora is crucial in identifying depression and can lead to misleading conclusions in case of poor choice of data.
Anthology ID:
W18-5903
Volume:
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Graciela Gonzalez-Hernandez, Davy Weissenbacher, Abeed Sarker, Michael Paul
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9–12
Language:
URL:
https://aclanthology.org/W18-5903
DOI:
10.18653/v1/W18-5903
Bibkey:
Cite (ACL):
Inna Pirina and Çağrı Çöltekin. 2018. Identifying Depression on Reddit: The Effect of Training Data. In Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task, pages 9–12, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Identifying Depression on Reddit: The Effect of Training Data (Pirina & Çöltekin, EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-5903.pdf