Semi-Supervised Language Models for Identification of Personal Health Experiential from Twitter Data: A Case for Medication Effects

Minghao Zhu, Keyuan Jiang


Abstract
First-hand experience related to any changes of one’s health condition and understanding such experience can play an important role in advancing medical science and healthcare. Monitoring the safe use of medication drugs is an important task of pharmacovigilance, and first-hand experience of effects about consumers’ medication intake can be valuable to gain insight into how our human body reacts to medications. Social media have been considered as a possible alternative data source for gathering personal experience with medications posted by users. Identifying personal experience tweets is a challenging classification task, and efforts have made to tackle the challenges using supervised approaches requiring annotated data. There exists abundance of unlabeled Twitter data, and being able to use such data for training without suffering in classification performance is of great value, which can reduce the cost of laborious annotation process. We investigated two semi-supervised learning methods, with different mixes of labeled and unlabeled data in the training set, to understand the impact on classification performance. Our results from both pseudo-label and consistency regularization methods show that both methods generated a noticeable improvement in F1 score when the labeled set was small, and consistency regularization could still provide a small gain even a larger labeled set was used.
Anthology ID:
2021.bionlp-1.25
Volume:
Proceedings of the 20th Workshop on Biomedical Language Processing
Month:
June
Year:
2021
Address:
Online
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
228–237
Language:
URL:
https://aclanthology.org/2021.bionlp-1.25
DOI:
10.18653/v1/2021.bionlp-1.25
Bibkey:
Cite (ACL):
Minghao Zhu and Keyuan Jiang. 2021. Semi-Supervised Language Models for Identification of Personal Health Experiential from Twitter Data: A Case for Medication Effects. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 228–237, Online. Association for Computational Linguistics.
Cite (Informal):
Semi-Supervised Language Models for Identification of Personal Health Experiential from Twitter Data: A Case for Medication Effects (Zhu & Jiang, BioNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.bionlp-1.25.pdf