2024
pdf
bib
abs
Overview of the 9th Social Media Mining for Health Applications (#SMM4H) Shared Tasks at ACL 2024 – Large Language Models and Generalizability for Social Media NLP
Dongfang Xu
|
Guillermo Garcia
|
Lisa Raithel
|
Philippe Thomas
|
Roland Roller
|
Eiji Aramaki
|
Shoko Wakamiya
|
Shuntaro Yada
|
Pierre Zweigenbaum
|
Karen O’Connor
|
Sai Samineni
|
Sophia Hernandez
|
Yao Ge
|
Swati Rajwal
|
Sudeshna Das
|
Abeed Sarker
|
Ari Klein
|
Ana Schmidt
|
Vishakha Sharma
|
Raul Rodriguez-Esteban
|
Juan Banda
|
Ivan Amaro
|
Davy Weissenbacher
|
Graciela Gonzalez-Hernandez
Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks
For the past nine years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in publicly available user-generated content. This year, #SMM4H included seven shared tasks in English, Japanese, German, French, and Spanish from Twitter, Reddit, and health forums. A total of 84 teams from 22 countries registered for #SMM4H, and 45 teams participated in at least one task. This represents a growth of 180% and 160% in registration and participation, respectively, compared to the last iteration. This paper provides an overview of the tasks and participating systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.
pdf
bib
abs
Unveiling Voices: Identification of Concerns in a Social Media Breast Cancer Cohort via Natural Language Processing
Swati Rajwal
|
Avinash Kumar Pandey
|
Zhishuo Han
|
Abeed Sarker
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024
We leveraged a dataset of ∼1.5 million Twitter (now X) posts to develop a framework for analyzing breast cancer (BC) patients’ concerns and possible reasons for treatment discontinuation. Our primary objectives were threefold: (1) to curate and collect data from a BC cohort; (2) to identify topics related to uncertainty/concerns in BC-related posts; and (3) to conduct a sentiment intensity analysis of posts to identify and analyze negatively polarized posts. RoBERTa outperformed other models with a micro-averaged F1 score of 0.894 and a macro-averaged F1 score of 0.853 for (1). For (2), we used GPT-4 and BERTopic, and qualitatively analyzed posts under relevant topics. For (3), sentiment intensity analysis of posts followed by qualitative analyses shed light on potential reasons behind treatment discontinuation. Our work demonstrates the utility of social media mining to discover BC patient concerns. Information derived from the cohort data may help design strategies in the future for increasing treatment compliance.
pdf
bib
abs
EM_Mixers at MEDIQA-CORR 2024: Knowledge-Enhanced Few-Shot In-Context Learning for Medical Error Detection and Correction
Swati Rajwal
|
Eugene Agichtein
|
Abeed Sarker
Proceedings of the 6th Clinical Natural Language Processing Workshop
This paper describes our submission to MEDIQA-CORR 2024 shared task for automatic identification and correction of medical errors in a given clinical text. We report results from two approaches: the first uses a few-shot in-context learning (ICL) with a Large Language Model (LLM) and the second approach extends the idea by using a knowledge-enhanced few-shot ICL approach. We used Azure OpenAI GPT-4 API as the LLM and Wikipedia as the external knowledge source. We report evaluation metrics (accuracy, ROUGE, BERTScore, BLEURT) across both approaches for validation and test datasets. Of the two approaches implemented, our experimental results show that the knowledge-enhanced few-shot ICL approach with GPT-4 performed better with error flag (subtask A) and error sentence detection (subtask B) with accuracies of 68% and 64%, respectively on the test dataset. These results positioned us fourth in subtask A and second in subtask B, respectively in the shared task.
2023
pdf
bib
abs
SocBERT: A Pretrained Model for Social Media Text
Yuting Guo
|
Abeed Sarker
Proceedings of the Fourth Workshop on Insights from Negative Results in NLP
Pretrained language models (PLMs) on domain-specific data have been proven to be effective for in-domain natural language processing (NLP) tasks. Our work aimed to develop a language model which can be effective for the NLP tasks with the data from diverse social media platforms. We pretrained a language model on Twitter and Reddit posts in English consisting of 929M sequence blocks for 112K steps. We benchmarked our model and 3 transformer-based models—BERT, BERTweet, and RoBERTa on 40 social media text classification tasks. The results showed that although our model did not perform the best on all of the tasks, it outperformed the baseline model—BERT on most of the tasks, which illustrates the effectiveness of our model. Also, our work provides some insights of how to improve the efficiency of training PLMs.
2022
pdf
bib
abs
Overview of the Seventh Social Media Mining for Health Applications (#SMM4H) Shared Tasks at COLING 2022
Davy Weissenbacher
|
Juan Banda
|
Vera Davydova
|
Darryl Estrada Zavala
|
Luis Gasco Sánchez
|
Yao Ge
|
Yuting Guo
|
Ari Klein
|
Martin Krallinger
|
Mathias Leddin
|
Arjun Magge
|
Raul Rodriguez-Esteban
|
Abeed Sarker
|
Lucia Schmidt
|
Elena Tutubalina
|
Graciela Gonzalez-Hernandez
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
For the past seven years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted the community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in public, user-generated content. This seventh iteration consists of ten tasks that include English and Spanish posts on Twitter, Reddit, and WebMD. Interest in the #SMM4H shared tasks continues to grow, with 117 teams that registered and 54 teams that participated in at least one task—a 17.5% and 35% increase in registration and participation, respectively, over the last iteration. This paper provides an overview of the tasks and participants’ systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.
2021
pdf
bib
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
Arjun Magge
|
Ari Klein
|
Antonio Miranda-Escalada
|
Mohammed Ali Al-garadi
|
Ilseyar Alimova
|
Zulfat Miftahutdinov
|
Eulalia Farre-Maduell
|
Salvador Lima Lopez
|
Ivan Flores
|
Karen O'Connor
|
Davy Weissenbacher
|
Elena Tutubalina
|
Abeed Sarker
|
Juan M Banda
|
Martin Krallinger
|
Graciela Gonzalez-Hernandez
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
pdf
bib
abs
Overview of the Sixth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at NAACL 2021
Arjun Magge
|
Ari Klein
|
Antonio Miranda-Escalada
|
Mohammed Ali Al-Garadi
|
Ilseyar Alimova
|
Zulfat Miftahutdinov
|
Eulalia Farre
|
Salvador Lima López
|
Ivan Flores
|
Karen O’Connor
|
Davy Weissenbacher
|
Elena Tutubalina
|
Abeed Sarker
|
Juan Banda
|
Martin Krallinger
|
Graciela Gonzalez-Hernandez
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
The global growth of social media usage over the past decade has opened research avenues for mining health related information that can ultimately be used to improve public health. The Social Media Mining for Health Applications (#SMM4H) shared tasks in its sixth iteration sought to advance the use of social media texts such as Twitter for pharmacovigilance, disease tracking and patient centered outcomes. #SMM4H 2021 hosted a total of eight tasks that included reruns of adverse drug effect extraction in English and Russian and newer tasks such as detecting medication non-adherence from Twitter and WebMD forum, detecting self-reported adverse pregnancy outcomes, detecting cases and symptoms of COVID-19, identifying occupations mentioned in Spanish by Twitter users, and detecting self-reported breast cancer diagnosis. The eight tasks included a total of 12 individual subtasks spanning three languages requiring methods for binary classification, multi-class classification, named entity recognition and entity normalization. With a total of 97 registering teams and 40 teams submitting predictions, the interest in the shared tasks grew by 70% and participation grew by 38% compared to the previous iteration.
pdf
bib
abs
Pre-trained Transformer-based Classification and Span Detection Models for Social Media Health Applications
Yuting Guo
|
Yao Ge
|
Mohammed Ali Al-Garadi
|
Abeed Sarker
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
This paper describes our approach for six classification tasks (Tasks 1a, 3a, 3b, 4 and 5) and one span detection task (Task 1b) from the Social Media Mining for Health (SMM4H) 2021 shared tasks. We developed two separate systems for classification and span detection, both based on pre-trained Transformer-based models. In addition, we applied oversampling and classifier ensembling in the classification tasks. The results of our submissions are over the median scores in all tasks except for Task 1a. Furthermore, our model achieved first place in Task 4 and obtained a 7% higher F1-score than the median in Task 1b.
pdf
bib
abs
An Ensemble Model for Automatic Grading of Evidence
Yuting Guo
|
Yao Ge
|
Ruqi Liao
|
Abeed Sarker
Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association
This paper describes our approach for the automatic grading of evidence task from the Australasian Language Technology Association (ALTA) Shared Task 2021. We developed two classification models with SVM and RoBERTa and applied an ensemble technique to combine the grades from different classifiers. Our results showed that the SVM model achieved comparable results to the RoBERTa model, and the ensemble system outperformed the individual models on this task. Our system achieved the first place among five teams and obtained 3.3% higher accuracy than the second place.
2020
pdf
bib
abs
Emory at WNUT-2020 Task 2: Combining Pretrained Deep Learning Models and Feature Enrichment for Informative Tweet Identification
Yuting Guo
|
Mohammed Ali Al-Garadi
|
Abeed Sarker
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
This paper describes the system developed by the Emory team for the WNUT-2020 Task 2: “Identifi- cation of Informative COVID-19 English Tweet”. Our system explores three recent Transformer- based deep learning models pretrained on large- scale data to encode documents. Moreover, we developed two feature enrichment methods to en- hance document embeddings by integrating emoji embeddings and syntactic features into deep learn- ing models. Our system achieved F1-score of 0.897 and accuracy of 90.1% on the test set, and ranked in the top-third of all 55 teams.
pdf
bib
abs
Benchmarking of Transformer-Based Pre-Trained Models on Social Media Text Classification Datasets
Yuting Guo
|
Xiangjue Dong
|
Mohammed Ali Al-Garadi
|
Abeed Sarker
|
Cecile Paris
|
Diego Mollá Aliod
Proceedings of the 18th Annual Workshop of the Australasian Language Technology Association
Free text data from social media is now widely used in natural language processing research, and one of the most common machine learning tasks performed on this data is classification. Generally speaking, performances of supervised classification algorithms on social media datasets are lower than those on texts from other sources, but recently-proposed transformer-based models have considerably improved upon legacy state-of-the-art systems. Currently, there is no study that compares the performances of different variants of transformer-based models on a wide range of social media text classification datasets. In this paper, we benchmark the performances of transformer-based pre-trained models on 25 social media text classification datasets, 6 of which are health-related. We compare three pre-trained language models, RoBERTa-base, BERTweet and ClinicalBioBERT in terms of classification accuracy. Our experiments show that RoBERTa-base and BERTweet perform comparably on most datasets, and considerably better than ClinicalBioBERT, even on health-related datasets.
pdf
bib
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
Graciela Gonzalez-Hernandez
|
Ari Z. Klein
|
Ivan Flores
|
Davy Weissenbacher
|
Arjun Magge
|
Karen O'Connor
|
Abeed Sarker
|
Anne-Lyse Minard
|
Elena Tutubalina
|
Zulfat Miftahutdinov
|
Ilseyar Alimova
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
pdf
bib
abs
Overview of the Fifth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at COLING 2020
Ari Klein
|
Ilseyar Alimova
|
Ivan Flores
|
Arjun Magge
|
Zulfat Miftahutdinov
|
Anne-Lyse Minard
|
Karen O’Connor
|
Abeed Sarker
|
Elena Tutubalina
|
Davy Weissenbacher
|
Graciela Gonzalez-Hernandez
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
The vast amount of data on social media presents significant opportunities and challenges for utilizing it as a resource for health informatics. The fifth iteration of the Social Media Mining for Health Applications (#SMM4H) shared tasks sought to advance the use of Twitter data (tweets) for pharmacovigilance, toxicovigilance, and epidemiology of birth defects. In addition to re-runs of three tasks, #SMM4H 2020 included new tasks for detecting adverse effects of medications in French and Russian tweets, characterizing chatter related to prescription medication abuse, and detecting self reports of birth defect pregnancy outcomes. The five tasks required methods for binary classification, multi-class classification, and named entity recognition (NER). With 29 teams and a total of 130 system submissions, participation in the #SMM4H shared tasks continues to grow.
2019
pdf
bib
abs
Overview of the Fourth Social Media Mining for Health (SMM4H) Shared Tasks at ACL 2019
Davy Weissenbacher
|
Abeed Sarker
|
Arjun Magge
|
Ashlynn Daughton
|
Karen O’Connor
|
Michael J. Paul
|
Graciela Gonzalez-Hernandez
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task
The number of users of social media continues to grow, with nearly half of adults worldwide and two-thirds of all American adults using social networking. Advances in automated data processing, machine learning and NLP present the possibility of utilizing this massive data source for biomedical and public health applications, if researchers address the methodological challenges unique to this media. We present the Social Media Mining for Health Shared Tasks collocated with the ACL at Florence in 2019, which address these challenges for health monitoring and surveillance, utilizing state of the art techniques for processing noisy, real-world, and substantially creative language expressions from social media users. For the fourth execution of this challenge, we proposed four different tasks. Task 1 asked participants to distinguish tweets reporting an adverse drug reaction (ADR) from those that do not. Task 2, a follow-up to Task 1, asked participants to identify the span of text in tweets reporting ADRs. Task 3 is an end-to-end task where the goal was to first detect tweets mentioning an ADR and then map the extracted colloquial mentions of ADRs in the tweets to their corresponding standard concept IDs in the MedDRA vocabulary. Finally, Task 4 asked participants to classify whether a tweet contains a personal mention of one’s health, a more general discussion of the health issue, or is an unrelated mention. A total of 34 teams from around the world registered and 19 teams from 12 countries submitted a system run. We summarize here the corpora for this challenge which are freely available at
https://competitions.codalab.org/competitions/22521, and present an overview of the methods and the results of the competing systems.
2018
pdf
bib
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task
Graciela Gonzalez-Hernandez
|
Davy Weissenbacher
|
Abeed Sarker
|
Michael Paul
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task
pdf
bib
abs
Overview of the Third Social Media Mining for Health (SMM4H) Shared Tasks at EMNLP 2018
Davy Weissenbacher
|
Abeed Sarker
|
Michael J. Paul
|
Graciela Gonzalez-Hernandez
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task
The goals of the SMM4H shared tasks are to release annotated social media based health related datasets to the research community, and to compare the performances of natural language processing and machine learning systems on tasks involving these datasets. The third execution of the SMM4H shared tasks, co-hosted with EMNLP-2018, comprised of four subtasks. These subtasks involve annotated user posts from Twitter (tweets) and focus on the (i) automatic classification of tweets mentioning a drug name, (ii) automatic classification of tweets containing reports of first-person medication intake, (iii) automatic classification of tweets presenting self-reports of adverse drug reaction (ADR) detection, and (iv) automatic classification of vaccine behavior mentions in tweets. A total of 14 teams participated and 78 system runs were submitted (23 for task 1, 20 for task 2, 18 for task 3, 17 for task 4).
2017
pdf
bib
abs
HLP@UPenn at SemEval-2017 Task 4A: A simple, self-optimizing text classification system combining dense and sparse vectors
Abeed Sarker
|
Graciela Gonzalez
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
We present a simple supervised text classification system that combines sparse and dense vector representations of words, and generalized representations of words via clusters. The sparse vectors are generated from word n-gram sequences (1-3). The dense vector representations of words (embeddings) are learned by training a neural network to predict neighboring words in a large unlabeled dataset. To classify a text segment, the different representations of it are concatenated, and the classification is performed using Support Vector Machines (SVM). Our system is particularly intended for use by non-experts of natural language processing and machine learning, and, therefore, the system does not require any manual tuning of parameters or weights. Given a training set, the system automatically generates the training vectors, optimizes the relevant hyper-parameters for the SVM classifier, and trains the classification model. We evaluated this system on the SemEval-2017 English sentiment analysis task. In terms of average F1-score, our system obtained 8th position out of 39 submissions (F1-score: 0.632, average recall: 0.637, accuracy: 0.646).
pdf
bib
abs
Detecting Personal Medication Intake in Twitter: An Annotated Corpus and Baseline Classification System
Ari Klein
|
Abeed Sarker
|
Masoud Rouhizadeh
|
Karen O’Connor
|
Graciela Gonzalez
BioNLP 2017
Social media sites (e.g., Twitter) have been used for surveillance of drug safety at the population level, but studies that focus on the effects of medications on specific sets of individuals have had to rely on other sources of data. Mining social media data for this in-formation would require the ability to distinguish indications of personal medication in-take in this media. Towards that end, this paper presents an annotated corpus that can be used to train machine learning systems to determine whether a tweet that mentions a medication indicates that the individual posting has taken that medication at a specific time. To demonstrate the utility of the corpus as a training set, we present baseline results of supervised classification.
2016
pdf
bib
DiegoLab16 at SemEval-2016 Task 4: Sentiment Analysis in Twitter using Centroids, Clusters, and Sentiment Lexicons
Abeed Sarker
|
Graciela Gonzalez
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
pdf
bib
abs
Data, tools and resources for mining social media drug chatter
Abeed Sarker
|
Graciela Gonzalez
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)
Social media has emerged into a crucial resource for obtaining population-based signals for various public health monitoring and surveillance tasks, such as pharmacovigilance. There is an abundance of knowledge hidden within social media data, and the volume is growing. Drug-related chatter on social media can include user-generated information that can provide insights into public health problems such as abuse, adverse reactions, long-term effects, and multi-drug interactions. Our objective in this paper is to present to the biomedical natural language processing, data science, and public health communities data sets (annotated and unannotated), tools and resources that we have collected and created from social media. The data we present was collected from Twitter using the generic and brand names of drugs as keywords, along with their common misspellings. Following the collection of the data, annotation guidelines were created over several iterations, which detail important aspects of social media data annotation and can be used by future researchers for developing similar data sets. The annotation guidelines were followed to prepare data sets for text classification, information extraction and normalization. In this paper, we discuss the preparation of these guidelines, outline the data sets prepared, and present an overview of our state-of-the-art systems for data collection, supervised classification, and information extraction. In addition to the development of supervised systems for classification and extraction, we developed and released unlabeled data and language models. We discuss the potential uses of these language models in data mining and the large volumes of unlabeled data from which they were generated. We believe that the summaries and repositories we present here of our data, annotation guidelines, models, and tools will be beneficial to the research community as a single-point entry for all these resources, and will promote further research in this area.
2015
pdf
bib
DIEGOLab: An Approach for Message-level Sentiment Classification in Twitter
Abeed Sarker
|
Azadeh Nikfarjam
|
Davy Weissenbacher
|
Graciela Gonzalez
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
2014
pdf
bib
Impact of Citing Papers for Summarisation of Clinical Documents
Diego Mollá
|
Christopher Jones
|
Abeed Sarker
Proceedings of the Australasian Language Technology Association Workshop 2014
2013
pdf
bib
Automatic Prediction of Evidence-based Recommendations via Sentence-level Polarity Classification
Abeed Sarker
|
Diego Mollá-Aliod
|
Cécile Paris
Proceedings of the Sixth International Joint Conference on Natural Language Processing
2012
pdf
bib
Towards Two-step Multi-document Summarisation for Evidence Based Medicine: A Quantitative Analysis
Abeed Sarker
|
Diego Mollá-Aliod
|
Cécile Paris
Proceedings of the Australasian Language Technology Association Workshop 2012
2011
pdf
bib
Automatic Grading of Evidence: the 2011 ALTA Shared Task
Diego Molla
|
Abeed Sarker
Proceedings of the Australasian Language Technology Association Workshop 2011
pdf
bib
Outcome Polarity Identification of Medical Papers
Abeed Sarker
|
Diego Molla
|
Cécile Paris
Proceedings of the Australasian Language Technology Association Workshop 2011