Sentence Classification with Imbalanced Data for Health Applications

Farhana Ferdousi Liza


Abstract
Identifying and extracting reports of medications, their abuse or adverse effects from social media is a challenging task. In social media, relevant reports are very infrequent, causes imbalanced class distribution for machine learning algorithms. Learning algorithms typically designed to optimize the overall accuracy without considering the relative distribution of each class. Thus, imbalanced class distribution is problematic as learning algorithms have low predictive accuracy for the infrequent class. Moreover, social media represents natural linguistic variation in creative language expressions. In this paper, we have used a combination of data balancing and neural language representation techniques to address the challenges. Specifically, we participated the shared tasks 1, 2 (all languages), 4, and 3 (only the span detection, no normalization was attempted) in Social Media Mining for Health applications (SMM4H) 2020 (Klein et al., 2020). The results show that with the proposed methodology recall scores are better than the precision scores for the shared tasks. The recall score is also better compared to the mean score of the total submissions. However, the F1-score is worse than the mean score except for task 2 (French).
Anthology ID:
2020.smm4h-1.25
Volume:
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Graciela Gonzalez-Hernandez, Ari Z. Klein, Ivan Flores, Davy Weissenbacher, Arjun Magge, Karen O'Connor, Abeed Sarker, Anne-Lyse Minard, Elena Tutubalina, Zulfat Miftahutdinov, Ilseyar Alimova
Venue:
SMM4H
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
138–145
Language:
URL:
https://aclanthology.org/2020.smm4h-1.25
DOI:
Bibkey:
Cite (ACL):
Farhana Ferdousi Liza. 2020. Sentence Classification with Imbalanced Data for Health Applications. In Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task, pages 138–145, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
Sentence Classification with Imbalanced Data for Health Applications (Liza, SMM4H 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.smm4h-1.25.pdf
Data
SMM4H