Unveiling Voices: Identification of Concerns in a Social Media Breast Cancer Cohort via Natural Language Processing

Swati Rajwal; Avinash Kumar Pandey; Zhishuo Han; Abeed Sarker

Unveiling Voices: Identification of Concerns in a Social Media Breast Cancer Cohort via Natural Language Processing

Swati Rajwal, Avinash Kumar Pandey, Zhishuo Han, Abeed Sarker

Abstract

We leveraged a dataset of ∼1.5 million Twitter (now X) posts to develop a framework for analyzing breast cancer (BC) patients’ concerns and possible reasons for treatment discontinuation. Our primary objectives were threefold: (1) to curate and collect data from a BC cohort; (2) to identify topics related to uncertainty/concerns in BC-related posts; and (3) to conduct a sentiment intensity analysis of posts to identify and analyze negatively polarized posts. RoBERTa outperformed other models with a micro-averaged F1 score of 0.894 and a macro-averaged F1 score of 0.853 for (1). For (2), we used GPT-4 and BERTopic, and qualitatively analyzed posts under relevant topics. For (3), sentiment intensity analysis of posts followed by qualitative analyses shed light on potential reasons behind treatment discontinuation. Our work demonstrates the utility of social media mining to discover BC patient concerns. Information derived from the cohort data may help design strategies in the future for increasing treatment compliance.

Anthology ID:: 2024.cl4health-1.32
Volume:: Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Paul Thompson, Brian Ondov
Venues:: CL4Health | WS
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 264–270
Language:
URL:: https://aclanthology.org/2024.cl4health-1.32/
DOI:
Bibkey:
Cite (ACL):: Swati Rajwal, Avinash Kumar Pandey, Zhishuo Han, and Abeed Sarker. 2024. Unveiling Voices: Identification of Concerns in a Social Media Breast Cancer Cohort via Natural Language Processing. In Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024, pages 264–270, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Unveiling Voices: Identification of Concerns in a Social Media Breast Cancer Cohort via Natural Language Processing (Rajwal et al., CL4Health 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.cl4health-1.32.pdf

PDF Cite Search Fix data