Emir Y. Haskovic


2020

pdf bib
SMM4H Shared Task 2020 - A Hybrid Pipeline for Identifying Prescription Drug Abuse from Twitter: Machine Learning, Deep Learning, and Post-Processing
Isabel Metzger | Emir Y. Haskovic | Allison Black | Whitley M. Yi | Rajat S. Chandra | Mark T. Rutledge | William McMahon | Yindalon Aphinyanaphongs
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

This paper presents our approach to multi-class text categorization of tweets mentioning prescription medications as being indicative of potential abuse/misuse (A), consumption/non-abuse (C), mention-only (M), or an unrelated reference (U) using natural language processing techniques. Data augmentation increased our training and validation corpora from 13,172 tweets to 28,094 tweets. We also created word-embeddings on domain-specific social media and medical corpora. Our hybrid pipeline of an attention-based CNN with post-processing was the best performing system in task 4 of SMM4H 2020, with an F1 score of 0.51 for class A.