Correlating Twitter Language with Community-Level Health Outcomes

Arno Schneuwly, Ralf Grubenmann, Séverine Rion Logean, Mark Cieliebak, Martin Jaggi


Abstract
We study how language on social media is linked to mortal diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors.
Anthology ID:
W19-3210
Volume:
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Davy Weissenbacher, Graciela Gonzalez-Hernandez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
71–78
Language:
URL:
https://aclanthology.org/W19-3210
DOI:
10.18653/v1/W19-3210
Bibkey:
Cite (ACL):
Arno Schneuwly, Ralf Grubenmann, Séverine Rion Logean, Mark Cieliebak, and Martin Jaggi. 2019. Correlating Twitter Language with Community-Level Health Outcomes. In Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task, pages 71–78, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Correlating Twitter Language with Community-Level Health Outcomes (Schneuwly et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-3210.pdf
Code
 epfml/correlating-tweets