Social Media Mining for Health Applications Workshop (2020)

Volumes

Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task 33 papers

bib (full) Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

pdf bib abs
COVID-19 Twitter Monitor: Aggregating and Visualizing COVID-19 Related Trends in Social Media
Joseph Cornelius | Tilia Ellendorff | Lenz Furrer | Fabio Rinaldi

Social media platforms offer extensive information about the development of the COVID-19 pandemic and the current state of public health. In recent years, the Natural Language Processing community has developed a variety of methods to extract health-related information from posts on social media platforms. In order for these techniques to be used by a broad public, they must be aggregated and presented in a user-friendly way. We have aggregated ten methods to analyze tweets related to the COVID-19 pandemic, and present interactive visualizations of the results on our online platform, the COVID-19 Twitter Monitor. In the current version of our platform, we offer distinct methods for the inspection of the dataset, at different levels: corpus-wide, single post, and spans within each post. Besides, we allow the combination of different methods to enable a more selective acquisition of knowledge. Through the visual and interactive combination of various methods, interconnections in the different outputs can be revealed.

pdf bib abs
Conversation-Aware Filtering of Online Patient Forum Messages
Anne Dirkson | Suzan Verberne | Wessel Kraaij

Previous approaches to NLP tasks on online patient forums have been limited to single posts as units, thereby neglecting the overarching conversational structure. In this paper we explore the benefit of exploiting conversational context for filtering posts relevant to a specific medical topic. We experiment with two approaches to add conversational context to a BERT model: a sequential CRF layer and manually engineered features. Although neither approach can outperform the F1 score of the BERT baseline, we find that adding a sequential layer improves precision for all target classes whereas adding a non-sequential layer with manually engineered features leads to a higher recall for two out of three target classes. Thus, depending on the end goal, conversation-aware modelling may be beneficial for identifying relevant messages. We hope our findings encourage other researchers in this domain to move beyond studying messages in isolation towards more discourse-based data collection and classification. We release our code for the purpose of follow-up research.

Identifying patient information needs is an important issue for health care services and implementation of patient-centered care. A relevant number of people with diabetes mellitus experience a need for information during the course of the disease. Health-related online forums are a promising option for researching relevant information needs closely related to everyday life. In this paper, we present a novel data corpus comprising 4,664 contributions from an online diabetes forum in German language. Two annotation tasks were implemented. First, the contributions were categorised according to whether they contain a diabetes-specific information need or not, which might either be a non diabetes-specific information need or no information need at all, resulting in an agreement of 0.89 (Krippendorff’s α). Moreover, the textual content of diabetes-specific information needs was segmented and labeled using a well-founded definition of health-related information needs, which achieved a promising agreement of 0.82 (Krippendorff’s αu). We further report a baseline for two sub-tasks of the information extraction system planned for the long term: contribution categorization and segment classification.

The vast amount of data on social media presents significant opportunities and challenges for utilizing it as a resource for health informatics. The fifth iteration of the Social Media Mining for Health Applications (#SMM4H) shared tasks sought to advance the use of Twitter data (tweets) for pharmacovigilance, toxicovigilance, and epidemiology of birth defects. In addition to re-runs of three tasks, #SMM4H 2020 included new tasks for detecting adverse effects of medications in French and Russian tweets, characterizing chatter related to prescription medication abuse, and detecting self reports of birth defect pregnancy outcomes. The five tasks required methods for binary classification, multi-class classification, and named entity recognition (NER). With 29 teams and a total of 130 system submissions, participation in the #SMM4H shared tasks continues to grow.

pdf bib abs
Ensemble BERT for Classifying Medication-mentioning Tweets
Huong Dang | Kahyun Lee | Sam Henry | Özlem Uzuner

Twitter is a valuable source of patient-generated data that has been used in various population health studies. The first step in many of these studies is to identify and capture Twitter messages (tweets) containing medication mentions. In this article, we describe our submission to Task 1 of the Social Media Mining for Health Applications (SMM4H) Shared Task 2020. This task challenged participants to detect tweets that mention medications or dietary supplements in a natural, highly imbalance dataset. Our system combined a handcrafted preprocessing step with an ensemble of 20 BERT-based classifiers generated by dividing the training dataset into subsets using 10-fold cross validation and exploiting two BERT embedding models. Our system ranked first in this task, and improved the average F1 score across all participating teams by 19.07% with a precision, recall, and F1 on the test set of 83.75%, 87.01%, and 85.35% respectively.

In this paper, we described our systems for the first and second subtasks of Social Media Mining for Health Applications (SMM4H) shared task in 2020. The two subtasks are automatic classi-fication of medication mentions and adverse effect in tweets. Our systems for both subtasks are based on Robustly optimized BERT approach (RoBERTa) and our previous work at SMM4H’19. The best F1-scores achieved by our systems for subtask 1 and 2 were 0.7974 and 0.64 respec-tively, which outperformed the average F1-scores among all teams’ best runs by at least 0.13.

pdf bib abs
BERT Implementation for Detecting Adverse Drug Effects Mentions in Russian
Andrey Gusev | Anna Kuznetsova | Anna Polyanskaya | Egor Yatsishin

This paper describes a system developed for the Social Media Mining for Health 2020 shared task. Our team participated in the second subtask for Russian language creating a system to detect adverse drug reaction presence in a text. For our submission, we exploited an ensemble model architecture, combining BERT’s extension for Russian language, Logistic Regression and domain-specific preprocessing pipeline. Our system was ranked first among others, achieving F-score of 0.51.

pdf bib abs
KFU NLP Team at SMM4H 2020 Tasks: Cross-lingual Transfer Learning with Pretrained Language Models for Drug Reactions
Zulfat Miftahutdinov | Andrey Sakhovskiy | Elena Tutubalina

This paper describes neural models developed for the Social Media Mining for Health (SMM4H) 2020 shared tasks. Specifically, we participated in two tasks. We investigate the use of a language representation model BERT pretrained on a large-scale corpus of 5 million health-related user reviews in English and Russian. The ensemble of neural networks for extraction and normalization of adverse drug reactions ranked first among 7 teams at the SMM4H 2020 Task 3 and obtained a relaxed F1 of 46%. The BERT-based multilingual model for classification of English and Russian tweets that report adverse reactions ranked second among 16 and 7 teams at two first subtasks of the SMM4H 2019 Task 2 and obtained a relaxed F1 of 58% on English tweets and 51% on Russian tweets.

This paper presents our approach to multi-class text categorization of tweets mentioning prescription medications as being indicative of potential abuse/misuse (A), consumption/non-abuse (C), mention-only (M), or an unrelated reference (U) using natural language processing techniques. Data augmentation increased our training and validation corpora from 13,172 tweets to 28,094 tweets. We also created word-embeddings on domain-specific social media and medical corpora. Our hybrid pipeline of an attention-based CNN with post-processing was the best performing system in task 4 of SMM4H 2020, with an F1 score of 0.51 for class A.

pdf bib abs
Automatic Detecting for Health-related Twitter Data with BioBERT
Yang Bai | Xiaobing Zhou

Social media used for health applications usually contains a large amount of data posted by users, which brings various challenges to NLP, such as spoken language, spelling errors, novel/creative phrases, etc. In this paper, we describe our system submitted to SMM4H 2020: Social Media Mining for Health Applications Shared Task which consists of five sub-tasks. We participate in subtask 1, subtask 2-English, and subtask 5. Our final submitted approach is an ensemble of various fine-tuned transformer-based models. We illustrate that these approaches perform well in imbalanced datasets (For example, the class ratio is 1:10 in subtask 2), but our model performance is not good in extremely imbalanced datasets (For example, the class ratio is 1:400 in subtask 1). Finally, in subtask 1, our result is lower than the average score, in subtask 2-English, our result is higher than the average score, and in subtask 5, our result achieves the highest score. The code is available online.

pdf bib abs
Exploring Online Depression Forums via Text Mining: A Comparison of Reddit and a Curated Online Forum
Luis Moßburger | Felix Wende | Kay Brinkmann | Thomas Schmidt

We present a study employing various techniques of text mining to explore and compare two different online forums focusing on depression: (1) the subreddit r/depression (over 60 million tokens), a large, open social media platform and (2) Beyond Blue (almost 5 million tokens), a professionally curated and moderated depression forum from Australia. We are interested in how the language and the content on these platforms differ from each other. We scrape both forums for a specific period. Next to general methods of computational text analysis, we focus on sentiment analysis, topic modeling and the distribution of word categories to analyze these forums. Our results indicate that Beyond Blue is generally more positive and that the users are more supportive to each other. Topic modeling shows that Beyond Blue’s users talk more about adult topics like finance and work while topics shaped by school or college terms are more prevalent on r/depression. Based on our findings we hypothesize that the professional curation and moderation of a depression forum is beneficial for the discussion in it.

pdf bib abs
Towards Preemptive Detection of Depression and Anxiety in Twitter
David Owen | Jose Camacho-Collados | Luis Espinosa Anke

Depression and anxiety are psychiatric disorders that are observed in many areas of everyday life. For example, these disorders manifest themselves somewhat frequently in texts written by nondiagnosed users in social media. However, detecting users with these conditions is not a straightforward task as they may not explicitly talk about their mental state, and if they do, contextual cues such as immediacy must be taken into account. When available, linguistic flags pointing to probable anxiety or depression could be used by medical experts to write better guidelines and treatments. In this paper, we develop a dataset designed to foster research in depression and anxiety detection in Twitter, framing the detection task as a binary tweet classification problem. We then apply state-of-the-art classification models to this dataset, providing a competitive set of baselines alongside qualitative error analysis. Our results show that language models perform reasonably well, and better than more traditional baselines. Nonetheless, there is clear room for improvement, particularly with unbalanced training sets and in cases where seemingly obvious linguistic cues (keywords) are used counter-intuitively.

The team from the University of Michigan participated in three tasks in the Social Media Mining for Health Applications (#SMM4H) 2020 shared tasks – on detecting mentions of adverse effects (Task 2), extracting and normalizing them (Task 3), and detecting mentions of medication abuse (Task 4). Our approaches relied on a combination of traditional machine learning and deep learning models. On Tasks 2 and 4, our submitted runs performed at or above the task average.

pdf bib abs
How Far Can We Go with Just Out-of-the-box BERT Models?
Lucie Gattepaille

Social media have been seen as a promising data source for pharmacovigilance. Howev-er, methods for automatic extraction of Adverse Drug Reactions from social media plat-forms such as Twitter still need further development before they can be included reliably in routine pharmacovigilance practices. As the Bidirectional Encoder Representations from Transformer (BERT) models have shown great performance in many major NLP tasks recently, we decided to test its performance on the SMM4H Shared Tasks 1 to 3, by submitting results of pretrained and fine-tuned BERT models without more added knowledge than the one carried in the training datasets and additional datasets. Our three submissions all ended up above average over all teams’ submissions: 0.766 F1 for task 1 (15% above the average of 0.665), 0.47 F1 for task 2 (2% above the average of 0.46) and 0.380 F1 score for task 3 (30% above the average of 0.292). Used in many of the high-ranking submission in the 2019 edition of the SMM4H Shared Task, BERT contin-ues to be state-of-the-art in ADR extraction for Twitter data.

pdf bib abs
FBK@SMM4H2020: RoBERTa for Detecting Medications on Twitter
Silvia Casola | Alberto Lavelli

This paper describes a classifier for tweets that mention medications or supplements, based on a pretrained transformer. We developed such a system for our participation in Subtask 1 of the Social Media Mining for Health Application workshop, which featured an extremely unbalanced dataset. The model showed promising results, with an F1 of 0.8 (task mean: 0.66).

pdf bib abs
Autobots Ensemble: Identifying and Extracting Adverse Drug Reaction from Tweets Using Transformer Based Pipelines
Sougata Saha | Souvik Das | Prashi Khurana | Rohini Srihari

This paper details a system designed for Social Media Mining for Health Applications (SMM4H) Shared Task 2020. We specifically describe the systems designed to solve task 2: Automatic classification of multilingual tweets that report adverse effects, and task 3: Automatic extraction and normalization of adverse effects in English tweets. Fine tuning RoBERTa large for classifying English tweets enables us to achieve a F1 score of 56%, which is an increase of +10% compared to the average F1 score for all the submissions. Using BERT based NER and question answering, we are able to achieve a F1 score of 57.6% for extracting adverse reaction mentions from tweets, which is an increase of +1.2% compared to the average F1 score for all the submissions.

pdf bib abs
Transformer Models for Drug Adverse Effects Detection from Tweets
Pavel Blinov | Manvel Avetisian

In this paper we present the drug adverse effects detection system developed during our participation in the Social Media Mining for Health Applications Shared Task 2020. We experimented with transfer learning approach for English and Russian, BERT and RoBERTa architectures and several strategies for regression head composition. Our final submissions in both languages overcome average F1 by several percents margin.

pdf bib abs
Adverse Drug Reaction Detection in Twitter Using RoBERTa and Rules
Sedigh Khademi | Pari Delirhaghighi | Frada Burstein

This paper describes the method we developed for the Task 2 English variation of the Social Media Mining for Health Applications (SMM4H) 2020 shared task. The task was to classify tweets containing adverse effects (AE) after medication intake. Our approach combined transfer learning using a RoBERTa Large Transformer model with a rule-based post-prediction correction to improve model precision. The model’s F1-Score of 0.56 on the test dataset was 10% better than the mean of the F1-Score of the best submissions in the task.

pdf bib abs
SpeechTrans@SMM4H’20: Impact of Preprocessing and N-grams on Automatic Classification of Tweets That Mention Medications
Mohamed Lichouri | Mourad Abbas

This paper describes our system developed for automatically classifying tweets that mention medications. We used the Decision Tree classifier for this task. We have shown that using some elementary preprocessing steps and TF-IDF n-grams led to acceptable classifier performance. Indeed, the F1-score recorded was 74.58% in the development phase and 63.70% in the test phase.

pdf bib abs
Want to Identify, Extract and Normalize Adverse Drug Reactions in Tweets? Use RoBERTa
Katikapalli Subramanyam Kalyan | Sivanesan Sangeetha

This paper presents our approach for task 2 and task 3 of Social Media Mining for Health (SMM4H) 2020 shared tasks. In task 2, we have to differentiate adverse drug reaction (ADR) tweets from nonADR tweets and is treated as binary classification. Task 3 involves extracting ADR mentions and then mapping them to MedDRA codes. Extracting ADR mentions is treated as sequence labeling and normalizing ADR mentions is treated as multi-class classification. Our system is based on pre-trained language model RoBERTa and it achieves a) F1-score of 58% in task 2 which is 12% more than the average score b) relaxed F1-score of 70.1% in ADR extraction of task 3 which is 13.7% more than the average score and relaxed F1-score of 35% in ADR extraction + normalization of task 3 which is 5.8% more than the average score. Overall, our models achieve promising results in both the tasks with significant improvements over average scores.

pdf bib abs
Detecting Tweets Reporting Birth Defect Pregnancy Outcome Using Two-View CNN RNN Based Architecture
Saichethan Reddy

This research work addresses a new multi-class classification task (fifth task) provided at the fifth Social Media Mining for Health Applications (SMM4H) workshop. This automatic tweet classification task involves distinguishing three classes of tweets that mention birth defects. We propose a novel two view based CNN-BiGRU based architectures for this task. Experimental evaluation of our proposed architecture over the validation set gives encouraging results as it improves by approximately 7% over our single view model for the fifth task. Code of our proposed framework is made available on Github

pdf bib abs
Identification of Medication Tweets Using Domain-specific Pre-trained Language Models
Yandrapati Prakash Babu | Rajagopal Eswari

In this paper, we present our approach for task1 of SMM4H 2020. This task involves automatic classification of tweets mentioning medication or dietary supplements. For this task, we experiment with pre-trained models like Biomedical RoBERTa, Clinical BERT and Biomedical BERT. Our approach achieves F1-score of 73.56%.

This study describes our proposed model design for the SMM4H 2020 Task 1. We fine-tune ELECTRA transformers using our trained SVM filter for data augmentation, along with decision trees to detect medication mentions in tweets. Our best F1-score of 0.7578 exceeded the mean score 0.6646 of all 15 submitting teams.

This paper describes our participation to the SMM4H shared task 2. We designed a rule-based classifier that estimates whether a tweet mentions an adverse effect associated to a medication. Our system addresses English and French, and is based on a number of specific word lists and features. These cues were mostly obtained through an extensive corpus analysis of the provided training data. Different weighting schemes were tested (manually tuned or based on a logistic regression), the best one achieving a F1 score of 0.31 for English and 0.15 for French.

pdf bib abs
Sentence Classification with Imbalanced Data for Health Applications
Farhana Ferdousi Liza

Identifying and extracting reports of medications, their abuse or adverse effects from social media is a challenging task. In social media, relevant reports are very infrequent, causes imbalanced class distribution for machine learning algorithms. Learning algorithms typically designed to optimize the overall accuracy without considering the relative distribution of each class. Thus, imbalanced class distribution is problematic as learning algorithms have low predictive accuracy for the infrequent class. Moreover, social media represents natural linguistic variation in creative language expressions. In this paper, we have used a combination of data balancing and neural language representation techniques to address the challenges. Specifically, we participated the shared tasks 1, 2 (all languages), 4, and 3 (only the span detection, no normalization was attempted) in Social Media Mining for Health applications (SMM4H) 2020 (Klein et al., 2020). The results show that with the proposed methodology recall scores are better than the precision scores for the shared tasks. The recall score is also better compared to the mean score of the total submissions. However, the F1-score is worse than the mean score except for task 2 (French).

pdf bib abs
HITSZ-ICRC: A Report for SMM4H Shared Task 2020-Automatic Classification of Medications and Adverse Effect in Tweets
Xiaoyu Zhao | Ying Xiong | Buzhou Tang

This is the system description of the Harbin Institute of Technology Shenzhen (HITSZ) team for the first and second subtasks of the fifth Social Media Mining for Health Applications (SMM4H) shared task in 2020. The first task is automatic classification of tweets that mention medications and the second task is automatic classification of tweets in English that report adverse effects. The system we propose for these tasks is based on bidirectional encoder representations from transformers (BERT) incorporating with knowledge graph and retrieving evidence from online information. Our system achieves an F1 of 0.7553 in task 1 and an F1 of 0.5455 in task 2.

pdf bib abs
Automatic Classification of Tweets Mentioning a Medication Using Pre-trained Sentence Encoders
Laiba Mehnaz

This paper describes our submission to the 5th edition of the Social Media Mining for Health Applications (SMM4H) shared task 1. Task 1 aims at the automatic classification of tweets that mention a medication or a dietary supplement. This task is specifically challenging due to its highly imbalanced dataset, with only 0.2% of the tweets mentioning a drug. For our submission, we particularly focused on several pretrained encoders for text classification. We achieve an F1 score of 0.75 for the positive class on the test set.

pdf bib abs
Approaching SMM4H 2020 with Ensembles of BERT Flavours
George-Andrei Dima | Andrei-Marius Avram | Dumitru-Clementin Cercel

This paper describes our solutions submitted to the Social Media Mining for Health Applications (#SMM4H) Shared Task 2020. We participated in the following tasks: Task 1 aimed at classifying if a tweet reports medications or not, Task 2 (only for the English dataset) aimed at discriminating if a tweet mentions adverse effects or not, and Task 5 aimed at recognizing if a tweet mentions birth defects or not. Our work focused on studying different neural network architectures based on various flavors of bidirectional Transformers (i.e., BERT), in the context of the previously mentioned classification tasks. For Task 1, we achieved an F1-score (70.5%) above the mean performance of the best scores made by all teams, whereas for Task 2, we obtained an F1-score of 37%. Also, we achieved a micro-averaged F1-score of 62% for Task 5.

pdf bib abs
NLP@VCU: Identifying Adverse Effects in English Tweets for Unbalanced Data
Darshini Mahendran | Cora Lewis | Bridget McInnes

This paper describes our participation in the Social Media Mining for Health Application (SMM4H 2020) Challenge Track 2 for identifying tweets containing Adverse Effects (AEs). Our system uses Convolutional Neural Networks. We explore downsampling, oversampling, and adjusting the class weights to account for the imbalanced nature of the dataset. Our results showed downsampling outperformed oversampling and adjusting the class weights on the test set however all three obtained similar results on the development set.

pdf bib abs
Sentence Transformers and Bayesian Optimization for Adverse Drug Effect Detection from Twitter
Oguzhan Gencoglu

This paper describes our approach for detecting adverse drug effect mentions on Twitter as part of the Social Media Mining for Health Applications (SMM4H) 2020, Shared Task 2. Our approach utilizes multilingual sentence embeddings (sentence-BERT) for representing tweets and Bayesian hyperparameter optimization of sample weighting parameter for counterbalancing high class imbalance.

pdf bib abs
Sentence Contextual Encoder with BERT and BiLSTM for Automatic Classification with Imbalanced Medication Tweets
Olanrewaju Tahir Aduragba | Jialin Yu | Gautham Senthilnathan | Alexandra Crsitea

This paper details the system description and approach used by our team for the SMM4H 2020 competition, Task 1. Task 1 targets the automatic classification of tweets that mention medication. We adapted the standard BERT pretrain-then-fine-tune approach to include an intermediate training stage with a biLSTM architecture neural network acting as a further fine-tuning stage. We were inspired by the effectiveness of within-task further pre-training and sentence encoders. We show that this approach works well for a highly imbalanced dataset. In this case, the positive class is only 0.2% of the entire dataset. Our model performed better in both F1 and precision scores compared to the mean score for all participants in the competition and had a competitive recall score.

pdf bib abs
CLaC at SMM4H 2020: Birth Defect Mention Detection
Parsa Bagherzadeh | Sabine Bergler

For the detection of personal tweets, where a parent speaks of a child’s birth defect, CLaC combines ELMo word embeddings and gazetteer lists from external resources with a GCNN (for encoding dependencies), in a multi layer, transformer inspired architecture. To address the task, we compile several gazetteer lists from resources such as MeSH and GI. The proposed system obtains .69 for μF1 score in the SMM4H 2020 Task 5 where the competition average is .65.