Social Media Mining for Health Applications Workshop (2024)

Volumes

Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks 41 papers

pdf (full)
bib (full) Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks
Dongfang Xu | Graciela Gonzalez-Hernandez

pdf bib abs

ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents
Thang Ta | Abu Rahman | Lotfollah Najjar | Alexander Gelbukh

This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop, explicitly targeting the classification challenges within tweet data. Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety. Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children. We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets. We also presented some data augmentation methods to see their impact on the model performance. Finally, the systems obtained the best F1 score of 0.627 in Task 3 and the best F1 score of 0.841 in Task 5

pdf bib abs

CTYUN-AI@SMM4H-2024: Knowledge Extension Makes Expert Models
Yuming Fan | Dongming Yang | Lina Cao

This paper explores the potential of social media as a rich source of data for understanding public health trends and behaviors, particularly focusing on emotional well-being and the impact of environmental factors. We employed large language models (LLMs) and developed a suite of knowledge extension techniques to analyze social media content related to mental health issues, specifically examining 1) effects of outdoor spaces on social anxiety symptoms in Reddit,2) tweets reporting children’s medical disorders, and 3) self-reported ages in posts of Twitter and Reddit. Our knowledge extension approach encompasses both supervised data (i.e., sample augmentation and cross-task fine-tuning) and unsupervised data (i.e., knowledge distillation and cross-task pre-training), tackling the inherent challenges of sample imbalance and informality of social media language. The effectiveness of our approach is demonstrated by the superior performance across multiple tasks (i.e., Task 3, 5 and 6) at the SMM4H-2024. Notably, we achieved the best performance in all three tasks, underscoring the utility of our models in real-world applications.

pdf bib abs

DILAB at #SMM4H 2024: RoBERTa Ensemble for Identifying Children’s Medical Disorders in English Tweets
Azmine Toushik Wasi | Sheikh Rahman

This paper details our system developed for the 9th Social Media Mining for Health Research and Applications Workshop (SMM4H 2024), addressing Task 5 focused on binary classification of English tweets reporting children’s medical disorders. Our objective was to enhance the detection of tweets related to children’s medical issues. To do this, we use various pre-trained language models, like RoBERTa and BERT. We fine-tuned these models on the task-specific dataset, adjusting model layers and hyperparameters in an attempt to optimize performance. As we observe unstable fluctuations in performance metrics during training, we implement an ensemble approach that combines predictions from different learning epochs. Our model achieves promising results, with the best-performing configuration achieving F1 score of 93.8% on the validation set and 89.8% on the test set.

pdf bib abs

DILAB at #SMM4H 2024: Analyzing Social Anxiety Effects through Context-Aware Transfer Learning on Reddit Data
Sheikh Rahman | Azmine Toushik Wasi

This paper illustrates the system we design for Task 3 of the 9th Social Media Mining for Health (SMM4H 2024) shared tasks. The task presents posts made on the Reddit social media platform, specifically the *r/SocialAnxiety* subreddit, along with one or more outdoor activities as pre-determined keywords for each post. The task then requires each post to be categorized as either one of *positive*, *negative*, *no effect*, or *not outdoor activity* based on what effect the keyword(s) have on social anxiety. Our approach focuses on fine-tuning pre-trained language models to classify the posts. Additionally, we use fuzzy string matching to select only the text around the given keywords so that the model only has to focus on the contextual sentiment associated with the keywords. Using this system, our peak score is 0.65 macro-F1 on the validation set and 0.654 on test set.

pdf bib abs

Dolomites@#SMM4H 2024: Helping LLMs “Know The Drill” in Low-Resource Settings - A Study on Social Media Posts
Giuliano Tortoreto | Seyed Mahed Mousavi

The amount of data to fine-tune LLMs plays a crucial role in the performance of these models in downstream tasks. Consequently, it is not straightforward to deploy these models in low-resource settings. In this work, we investigate two new multi-task learning data augmentation approaches for fine-tuning LLMs when little data is available: “In-domain Augmentation” of the training data and extracting “Drills” as smaller tasks from the target dataset. We evaluate the proposed approaches in three natural language processing settings in the context of SMM4H 2024 competition tasks: multi-class classification, entity recognition, and information extraction. The results show that both techniques improve the performance of the models in all three settings, suggesting a positive impact from the knowledge learned in multi-task training to perform the target task.

pdf bib abs

RIGA at SMM4H-2024 Task 1: Enhancing ADE discovery with GPT-4
Eduards Mukans | Guntis Barzdins

The following is a description of the RIGA team’s submissions for the SMM4H-2024 Task 1: Extraction and normalization of adverse drug events (ADEs) in English tweets. Our approach focuses on utilizing Large Language Models (LLMs) to generate data that enhances the fine-tuning of classification and Named Entity Recognition (NER) models. Our solution significantly outperforms mean and median submissions of other teams. The efficacy of our ADE extraction from tweets is comparable to the current state-of-the-art solution, established as the task baseline. The code for our method is available on GitHub (https://github.com/emukans/smm4h2024-riga)

pdf bib abs

Golden_Duck at #SMM4H 2024: A Transformer-based Approach to Social Media Text Classification
Md Ayon Mia | Mahshar Yahan | Hasan Murad | Muhammad Khan

In this paper, we have addressed Task 3 on social anxiety disorder identification and Task 5 on mental illness recognition organized by the SMM4H 2024 workshop. In Task 3, a multi-classification problem has been presented to classify Reddit posts about outdoor spaces into four categories: Positive, Neutral, Negative, or Unrelated. Using the pre-trained RoBERTa-base model along with techniques like Mean pooling, CLS, and Attention Head, we have scored an F1-Score of 0.596 on the test dataset for Task 3. Task 5 aims to classify tweets into two categories: those describing a child with conditions like ADHD, ASD, delayed speech, or asthma (class 1), and those merely mentioning a disorder (class 0). Using the pre-trained RoBERTa-large model, incorporating a weighted ensemble of the last 4 hidden layers through concatenation and mean pooling, we achieved an F1 Score of 0.928 on the test data for Task 5.

pdf bib abs

SRCB at #SMM4H 2024: Making Full Use of LLM-based Data Augmentation in Adverse Drug Event Extraction and Normalization
Hongyu Li | Yuming Zhang | Yongwei Zhang | Shanshan Jiang | Bin Dong

This paper reports on the performance of SRCB’s system in the Social Media Mining for Health (#SMM4H) 2024 Shared Task 1: extraction and normalization of adverse drug events (ADEs) in English tweets. We develop a system composed of an ADE extraction module and an ADE normalization module which furtherly includes a retrieval module and a filtering module. To alleviate the data imbalance and other issues introduced by the dataset, we employ 4 data augmentation techniques based on Large Language Models (LLMs) across both modules. Our best submission achieves an F1 score of 53.6 (49.4 on the unseen subset) on the ADE normalization task and an F1 score of 52.1 on ADE extraction task.

pdf bib abs

LT4SG@SMM4H’24: Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models
Dasun Athukoralage | Thushari Atapattu | Menasha Thilakaratne | Katrina Falkner

This paper presents our approaches for the SMM4H’24 Shared Task 5 on the binary classification of English tweets reporting children’s medical disorders. Our first approach involves fine-tuning a single RoBERTa-large model, while the second approach entails ensembling the results of three fine-tuned BERTweet-large models. We demonstrate that although both approaches exhibit identical performance on validation data, the BERTweet-large ensemble excels on test data. Our best-performing system achieves an F1-score of 0.938 on test data, outperforming the benchmark classifier by 1.18%.

pdf bib abs

UTRad-NLP at #SMM4H 2024: Why LLM-Generated Texts Fail to Improve Text Classification Models
Yosuke Yamagishi | Yuta Nakamura

In this paper, we present our approach to addressing the binary classification tasks, Tasks 5 and 6, as part of the Social Media Mining for Health (SMM4H) text classification challenge. Both tasks involved working with imbalanced datasets that featured a scarcity of positive examples. To mitigate this imbalance, we employed a Large Language Model to generate synthetic texts with positive labels, aiming to augment the training data for our text classification models. Unfortunately, this method did not significantly improve model performance. Through clustering analysis using text embeddings, we discovered that the generated texts significantly lacked diversity compared to the raw data. This finding highlights the challenges of using synthetic text generation for enhancing model efficacy in real-world applications, specifically in the context of health-related social media data.

pdf bib abs

HBUT at #SMM4H 2024 Task1: Extraction and Normalization of Adverse Drug Events with a Large Language Model
Yuanzhi Ke | Hanbo Jin | Xinyun Wu | Caiquan Xiong

In this paper, we describe our proposed systems for the Social Media Mining for Health 2024 shared task 1. We built our system on the basis of GLM, a pre-trained large language model with few-shot Learning capabilities, using a two-step prompting strategy to extract adverse drug event (ADE) and an ensemble method for normalization. In first step of extraction phase, we extract all the potential ADEs with in-context few-shot learning. In the second step for extraction, we let GLM to filer out false positive outputs in the first step by a tailored prompt. Then we normalize each ADE to its MedDRA preferred term ID (ptID) by an ensemble method using Reciprocal Rank Fusion (RRF). Our method achieved excellent recall rate. It obtained 41.1%, 42.8%, and 40.6% recall rate for ADE normalization, ADE recognition, and normalization for unseen ADEs, respectively. Compared to the performance of the average and median among all the participants in terms of recall rate, our recall rate scores are generally 10%-20% higher than the other participants’ systems.

pdf bib abs

SMM4H 2024: 5 Fold Cross Validation for Classification of tweets reporting children’s disorders
Lipika Dey | B Naik | Oppangi Poojita | Kovidh Pothireddi

This paper presents our system developed for the Social Media Mining for Health (SMM4H) 2024 Task 05. The task objective was binary classification of tweets provided in the dataset, distinguishing between those reporting medical disorders and those merely mentioning diseases. We address this challenge through the utilization of a 5-fold cross-validation approach, employing the RoBERTa-Large model. Evaluation results demonstrate an F1-score of 0.886 on the validation dataset and 0.823 on the test dataset.

pdf bib abs

HBUT at #SMM4H 2024 Task2: Cross-lingual Few-shot Medical Entity Extraction using a Large Language Model
Yuanzhi Ke | Zhangju Yin | Xinyun Wu | Caiquan Xiong

Named entity recognition (NER) of drug and disorder/body function mentions in web text is challenging in the face of multilingualism, limited data, and poor data quality. Traditional small-scale models struggle to cope with the task. Large language models with conventional prompts also yield poor results. In this paper, we introduce our system, which employs a large language model (LLM) with a novel two-step prompting strategy. Instead of directly extracting the target medical entities, our system firstly extract all entities and then prompt the LLM to extract drug and disorder entities given the all-entity list and original input text as the context. The experimental and test results indicate that this strategy successfully enhanced our system performance, especially for German language.

pdf bib abs

We present our approach to solving the task of identifying the effect of outdoor activities on social anxiety based on reddit posts. We employed state-of-the-art transformer models enhanced with a combination of advanced loss functions. Data augmentation techniques were also used to address class imbalance within the training set. Our method achieved a macro-averaged F1-score of 0.655 on the test data, surpassing the workshop’s mean F1-Score of 0.519. These findings suggest that integrating weighted loss functions improves the performance of transformer models in classifying unbalanced text data, while data augmentation can improve the model’s ability to generalize.

pdf bib abs

Transformers at #SMM4H 2024: Identification of Tweets Reporting Children’s Medical Disorders And Effects of Outdoor Spaces on Social Anxiety Symptoms on Reddit Using RoBERTa
Kriti Singhal | Jatin Bedi

With the widespread increase in the use of social media platforms such as Twitter, Instagram, and Reddit, people are sharing their views on various topics. They have become more vocal on these platforms about their views and opinions on the medical challenges they are facing. This data is a valuable asset of medical insights in the study and research of healthcare. This paper describes our adoption of transformer-based approaches for tasks 3 and 5. For both tasks, we fine-tuned large RoBERTa, a BERT-based architecture, and achieved a highest F1 score of 0.413 and 0.900 in tasks 3 and 5, respectively.

pdf bib abs

Enhancing Social Media Health Prediction Certainty by Integrating Large Language Models with Transformer Classifiers
Sedigh Khademi | Christopher Palmer | Muhammad Javed | Jim Buttery | Gerardo Dimaguila

This paper presents our approach for SMM4H 2024 Task 5, focusing on identifying tweets where users discuss their child’s health conditions of ADHD, ASD, delayed speech, or asthma. Our approach uses a pipeline that combines transformer-based classifiers and GPT-4 large language models (LLMs). We first address data imbalance in the training set using topic modelling and under-sampling. Next, we train RoBERTa-based classifiers on the adjusted data. Finally, GPT-4 refines the classifier’s predictions for uncertain cases (confidence below 0.9). This strategy achieved significant improvement over the baseline RoBERTa models. Our work demonstrates the effectiveness of combining transformer classifiers and LLMs for extracting health insights from social media conversations.

pdf bib abs

This is the demonstration of systems and results of our team’s participation in the Social Medical Mining for Health (SMM4H) 2024 Shared Task. Our team participated in two tasks: Task 1 and Task 5. Task 5 requires the detection of tweet sentences that claim children’s medical disorders from certain users. Task 1 needs teams to extract and normalize Adverse Drug Event terms in the tweet sentence. The team selected several Pre-trained Language Models and generative Large Language Models to meet the requirements. Strategies to improve the performance include cloze test, prompt engineering, Low Rank Adaptation etc. The test result of our system has an F1 score of 0.935, Precision of 0.954 and Recall of 0.917 in Task 5 and an overall F1 score of 0.08 in Task 1.

pdf bib abs

Deloitte at #SMM4H 2024: Can GPT-4 Detect COVID-19 Tweets Annotated by Itself?
Harika Abburi | Nirmala Pudota | Balaji Veeramani | Edward Bowen | Sanmitra Bhattacharya

The advent of Large Language Models (LLMs) such as Generative Pre-trained Transformers (GPT-4) mark a transformative era in Natural Language Generation (NLG). These models demonstrate the ability to generate coherent text that closely resembles human-authored content. They are easily accessible and have become invaluable tools in handling various text-based tasks, such as data annotation, report generation, and question answering. In this paper, we investigate GPT-4’s ability to discern between data it has annotated and data annotated by humans, specifically within the context of tweets in the medical domain. Through experimental analysis, we observe GPT-4 outperform other state-of-the-art models. The dataset used in this study was provided by the SMM4H (Social Media Mining for Health Research and Applications) shared task. Our model achieved an accuracy of 0.51, securing a second rank in the shared task.

pdf bib abs

IMS_medicALY at #SMM4H 2024: Detecting Impacts of Outdoor Spaces on Social Anxiety with Data Augmented Ensembling
Amelie Wuehrl | Lynn Greschner | Yarik Menchaca Resendiz | Roman Klinger

Many individuals affected by Social Anxiety Disorder turn to social media platforms to share their experiences and seek advice. This includes discussing the potential benefits of engaging with outdoor environments. As part of #SMM4H 2024, Shared Task 3 focuses on classifying the effects of outdoor spaces on social anxiety symptoms in Reddit posts. In our contribution to the task, we explore the effectiveness of domain-specific models (trained on social media data – SocBERT) against general domain models (trained on diverse datasets – BERT, RoBERTa, GPT-3.5) in predicting the sentiment related to outdoor spaces. Further, we assess the benefits of augmenting sparse human-labeled data with synthetic training instances and evaluate the complementary strengths of domain-specific and general classifiers using an ensemble model. Our results show that (1) fine-tuning small, domain-specific models generally outperforms large general language models in most cases. Only one large language model (GPT-4) exhibits performance comparable to the fine-tuned models (52% F1). Further, we find that (2) synthetic data does improve the performance of fine-tuned models in some cases, and (3) models do not appear to complement each other in our ensemble setup.

pdf bib abs

1024m at SMM4H 2024: Tasks 3, 5 & 6 - Self Reported Health Text Classification through Ensembles
Ram Kadiyala | M.v.p. Rao

Social media is a great source of data for users reporting information and regarding their health and how various things have had an effect on them. This paper presents various approaches using Transformers and Large Language Models and their ensembles, their performance along with advantages and drawbacks for various tasks of SMM4H’24 - Classifying texts on impact of nature and outdoor spaces on the author’s mental health (Task 3), Binary classification of tweets reporting their children’s health disorders like Asthma, Autism, ADHD and Speech disorder (task 5), Binary classification of users self-reporting their age (task 6).

pdf bib abs

Experimenting with Transformer-based and Large Language Models for Classifying Effects of Outdoor Spaces on Social Anxiety in Social Media Data
Falwah Alhamed | Julia Ive | Lucia Specia

Social Anxiety Disorder (SAD) is a common condition, affecting a significant portion of the population. While research suggests spending time in nature can alleviate anxiety, the specific impact on SAD remains unclear. This study explores the relationship between discussions of outdoor spaces and social anxiety on social media. We leverage transformer-based and large language models (LLMs) to analyze a social media dataset focused on SAD. We developed three methods for the task of predicting the effects of outdoor spaces on SAD in social media. A two-stage pipeline classifier achieved the best performance of our submissions with results exceeding baseline performance.

pdf bib abs

interrupt-driven@SMM4H’24: Relevance-weighted Sentiment Analysis of Reddit Posts
Jessica Elliott | Roland Elliott

This paper describes our approach to Task 3 of the Social Media Mining for Health 2024 (SMM4H’24) shared tasks. The objective of the task was to classify the sentiment of social media posts, taken from the social anxiety subreddit, with reference to the outdoors, as positive, negative, neutral, or unrelated. We classified posts using a relevance-weighted sentiment analysis, which scored poorly, at 0.45 accuracy on the test set and 0.396 accuracy on the evaluation set. We consider what factors contributed to these low scores, and what alternatives could yield improvements, namely: improved data cleaning, a sentiment analyzer trained on a more suitable data set, improved sentiment heuristics, and a more involved relevance-weighting.

pdf bib abs

IITRoorkee@SMM4H 2024 Cross-Platform Age Detection in Twitter and Reddit Using Transformer-Based Model
Thadavarthi Sankar | Dudekula Suraj | Mallamgari Reddy | Durga Toshniwal | Amit Agarwal

This paper outlines the methodology for the automatic extraction of self-reported ages from social media posts as part of the Social Media Mining for Health (SMM4H) 2024 Workshop Shared Tasks. The focus was on Task 6: “Self-reported exact age classification with cross-platform evaluation in English.” The goal was to accurately identify age-related information from user-generated content, which is crucial for applications in public health monitoring, targeted advertising, and demographic research. A number of transformer-based models were employed, including RoBERTa-Base, BERT-Base, BiLSTM, and Flan T5 Base, leveraging their advanced capabilities in natural language understanding. The training strategies included fine-tuning foundational pre-trained language models and evaluating model performance using standard metrics: F1-score, Precision, and Recall. The experimental results demonstrated that the RoBERTa-Base model significantly outperformed the other models in this classification task. The best results achieved with the RoBERTa-Base model were an F1-score of 0.878, a Precision of 0.899, and a Recall of 0.858.

pdf bib abs

SMM4H’24 Task6 : Extracting Self-Reported Age with LLM and BERTweet: Fine-Grained Approaches for Social Media Text
Jaskaran Singh | Jatin Bedi | Maninder Kaur

The paper presents two distinct approaches to Task 6 of the SMM4H’24 workshop: extracting self-reported exact age information from social media posts across platforms. This research task focuses on developing methods for automatically extracting self-reported ages from posts on two prominent social media platforms: Twitter (now X) and Reddit. The work leverages two ways, one Mistral-7B-Instruct-v0.2 Large Language Model (LLM) and another pre-trained language model BERTweet, to achieve robust and generalizable age classification, surpassing limitations of existing methods that rely on predefined age groups. The proposed models aim to advance the automatic extraction of self-reported exact ages from social media posts, enabling more nuanced analyses and insights into user demographics across different platforms.

pdf bib abs

AAST-NLP@#SMM4H’24: Finetuning Language Models for Exact Age Classification and Effect of Outdoor Spaces on Social Anxiety
Ahmed El-Sayed | Omar Nasr | Noha Tawfik

This paper evaluates the performance of “AAST-NLP” in the Social Media Mining for Health (SMM4H) Shared Tasks 3 and 6, where more than 20 teams participated in each. We leveraged state-of-the-art transformer-based models, including Mistral, to achieve our results. Our models consistently outperformed both the mean and median scores across the tasks. Specifically, an F1-score of 0.636 was achieved in classifying the impact of outdoor spaces on social anxiety symptoms, while an F1-score of 0.946 was recorded for the classification of self-reported exact ages.

pdf bib abs

CogAI@SMM4H 2024: Leveraging BERT-based Ensemble Models for Classifying Tweets on Developmental Disorders
Liza Dahiya | Rachit Bagga

This paper presents our work for the Task 5 of the Social Media Mining for Health Applications 2024 Shared Task - Binary classification of English tweets reporting children’s medical disorders. In this paper, we present and compare multiple approaches for automatically classifying tweets from parents based on whether they mention having a child with attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorders (ASD), delayed speech, or asthma. We use ensemble of various BERT-based models trained on provided dataset that yields an F1 score of 0.901 on the test data.

pdf bib abs

ADE Oracle at #SMM4H 2024: A Two-Stage NLP System for Extracting and Normalizing Adverse Drug Events from Tweets
Andrew Davis | Billy Dickson | Sandra Kübler

This study describes the approach of Team ADE Oracle for Task 1 of the Social Media Mining for Health Applications (#SMM4H) 2024 shared task. Task 1 challenges participants to detect adverse drug events (ADEs) within English tweets and normalize these mentions against the Medical Dictionary for Regulatory Activities standards. Our approach utilized a two-stage NLP pipeline consisting of a named entity recognition model, retrained to recognize ADEs, followed by vector similarity assessment with a RoBERTa-based model. Despite achieving a relatively high recall of 37.4% in the extraction of ADEs, indicative of effective identification of potential ADEs, our model encountered challenges with precision. We found marked discrepancies between recall and precision between the test set and our validation set, which underscores the need for further efforts to prevent overfitting and enhance the model’s generalization capabilities for practical applications.

pdf bib abs

BrainStorm @ iREL at #SMM4H 2024: Leveraging Translation and Topical Embeddings for Annotation Detection in Tweets
Manav Chaudhary | Harshit Gupta | Vasudeva Varma

The proliferation of LLMs in various NLP tasks has sparked debates regarding their reliability, particularly in annotation tasks where biases and hallucinations may arise. In this shared task, we address the challenge of distinguishing annotations made by LLMs from those made by human domain experts in the context of COVID-19 symptom detection from tweets in Latin American Spanish. This paper presents BrainStorm @ iREL’s approach to the #SMM4H 2024 Shared Task, leveraging the inherent topical information in tweets, we propose a novel approach to identify and classify annotations, aiming to enhance the trustworthiness of annotated data.

pdf bib abs

UKYNLP@SMM4H2024: Language Model Methods for Health Entity Tagging and Classification on Social Media (Tasks 4 & 5)
Motasem Obeidat | Vinu Ekanayake | Md Sultan Al Nahian | Ramakanth Kavuluru

We describe the methods and results of our submission to the 9th Social Media Mining for Health Research and Applications (SMM4H) 2024 shared tasks 4 and 5. Task 4 involved extracting the clinical and social impacts of non-medical substance use and task 5 focused on the binary classification of tweets reporting children’s medical disorders. We employed encoder language models and their ensembles, achieving the top score on task 4 and a high score for task 5.

pdf bib abs

LHS712_ADENotGood at #SMM4H 2024 Task 1: Deep-LLMADEminer: A deep learning and LLM pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter
Yifan Zheng | Jun Gong | Shushun Ren | Dalton Simancek | V.G.Vinod Vydiswaran

Adverse drug events (ADEs) pose major public health risks, with traditional reporting systems often failing to capture them. Our proposed pipeline, called Deep-LLMADEminer, used natural language processing approaches to tackle this issue for #SMM4H 2024 shared task 1. Using annotated tweets, we built a three part pipeline: RoBERTa for classification, GPT-4-turbo for span extraction, and BioBERT for normalization. Our models achieved F1-scores of 0.838, 0.306, and 0.354, respectively, offering a novel system for Task 1 and similar pharmacovigilance tasks.

pdf bib abs

HaleLab_NITK@SMM4H’24: Binary classification of English tweets reporting children’s medical disorders
Ritik Mahajan | Sowmya S.

This paper describes the work undertaken as part of the SMM4H-2024 shared task, specifically Task 5, which involves the binary classification of English tweets reporting children’s medical disorders. The primary objective is to develop a system capable of automatically identifying tweets from users who report their pregnancy and mention children with specific medical conditions, such as attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorders (ASD), delayed speech, or asthma, while distinguishing them from tweets that merely reference a disorder without much context. Our approach leverages advanced natural language processing techniques and machine learning algorithms to accurately classify the tweets. The system achieved an overall F1-score of 0.87, highlighting its robustness and effectiveness in addressing the classification challenge posed by this task.

pdf bib abs

Team Yseop at #SMM4H 2024: Multilingual Pharmacovigilance Named Entity Recognition and Relation Extraction
Anubhav Gupta

This paper describes three RoBERTa based systems. The first one recognizes adverse drug events (ADEs) in English tweets and links themwith MedDRA concepts. It scored F1-norm of 40 for the Task 1. The next one extracts pharmacovigilance related named entities inFrench and scored a F1 of 0.4132 for the Task 2a. The third system extracts pharmacovigilance related named entities and their relationsin Japanese. It obtained a F1 of 0.5827 for the Task 2a and 0.0301 for the Task 2b. The French and Japanese systems are the best performing system for the Task 2

pdf bib abs

KUL@SMM4H2024: Optimizing Text Classification with Quality-Assured Augmentation Strategies
Sumam Francis | Marie-Francine Moens

This paper presents our models for the Social Media Mining for Health 2024 shared task, specifically Task 5, which involves classifying tweets reporting a child with childhood disorders (annotated as “1”) versus those merely mentioning a disorder (annotated as “0”). We utilized a classification model enhanced with diverse textual and language model-based augmentations. To ensure quality, we used semantic similarity, perplexity, and lexical diversity as evaluation metrics. Combining supervised contrastive learning and cross-entropy-based learning, our best model, incorporating R-drop and various LM generation-based augmentations, achieved an impressive F1 score of 0.9230 on the test set, surpassing the task mean and median scores.

pdf bib abs

LHS712NV at #SMM4H 2024 Task 4: Using BERT to classify Reddit posts on non-medical substance use
Valeria Fraga | Neha Nair | Dalton Simancek | V.G.Vinod Vydiswaran

This paper summarizes our participation in the Shared Task 4 of #SMM4H 2024. Task 4 was a named entity recognition (NER) task identifying clinical and social impacts of non-medical substance use in English Reddit posts. We employed the Bidirectional Encoder Representations from Transformers (BERT) model to complete this task. Our team achieved an F1-score of 0.892 on a validation set and a relaxed F1-score of 0.191 on the test set.

pdf bib abs

712forTask7 at #SMM4H 2024 Task 7: Classifying Spanish Tweets Annotated by Humans versus Machines with BETO Models
Hafizh Yusuf | David Belmonte | Dalton Simancek | V.G.Vinod Vydiswaran

The goal of Social Media Mining for Health (#SMM4H) 2024 Task 7 was to train a machine learning model that is able to distinguish between annotations made by humans and those made by a Large Language Model (LLM). The dataset consisted of tweets originating from #SMM4H 2023 Task 3, wherein the objective was to extract COVID-19 symptoms in Latin-American Spanish tweets. Due to the lack of additional annotated tweets for classification, we reframed the task using the available tweets and their corresponding human or machine annotator labels to explore differences between the two subsets of tweets. We conducted an exploratory data analysis and trained a BERT-based classifier to identify sampling biases between the two subsets. The exploratory data analysis found no significant differences between the samples and our best classifier achieved a precision of 0.52 and a recall of 0.51, indicating near-random performance. This confirms the lack of sampling biases between the two sets of tweets and is thus a valid dataset for a task designed to assess the authorship of annotations by humans versus machines.

pdf bib abs

TLab at #SMM4H 2024: Retrieval-Augmented Generation for ADE Extraction and Normalization
Jacob Berkowitz | Apoorva Srinivasan | Jose Miguel Acitores Cortina | Nicholas P Tatonetti

SMM4H 2024 Task 1 is focused on the identification of standardized Adverse Drug Events (ADEs) in tweets. We introduce a novel Retrieval-Augmented Generation (RAG) method, leveraging the capabilities of Llama 3, GPT-4, and the SFR-embedding-mistral model, along with few-shot prompting techniques, to map colloquial tweet language to MedDRA Preferred Terms (PTs) without relying on extensive training datasets. Our method achieved competitive performance, with an F1 score of 0.359 in the normalization task and 0.392 in the named entity recognition (NER) task. Notably, our model demonstrated robustness in identifying previously unseen MedDRA PTs (F1=0.363) greatly surpassing the median task score of 0.141 for such terms.

pdf bib abs

BIT@UA at #SMM4H 2024 Tasks 1 and 5: finding adverse drug events and children’s medical disorders in English tweets
Luís Carlos Afonso | João Rafael Almeida | Rui Antunes | José Luís Oliveira

In this paper we present our proposed systems, for Tasks 1 and 5 of the #SMM4H-2024 shared task (Social Media Mining for Health), responsible for identifying health-related aspects in English social media text. Task 1 consisted of identifying text spans mentioning adverse drug events and linking them to unique identifiers from the medical terminology MedDRA, whereas in Task 5 the aim was to distinguish tweets that report a user having a child with a medical disorder from tweets that merely mention a disorder.For Task 1, our system, composed of a pre-trained RoBERTa model and a random forest classifier, achieved 0.397 and 0.295 entity recognition and normalization F1-scores respectively. In Task 5, we obtained a 0.840 F1-score using a pre-trained BERT model.

pdf bib abs

FORCE: A Benchmark Dataset for Foodborne Disease Outbreak and Recall Event Extraction from News
Sudeshna Jana | Manjira Sinha | Tirthankar Dasgupta

The escalating prevalence of food safety incidents within the food supply chain necessitates immediate action to protect consumers. These incidents encompass a spectrum of issues, including food product contamination and deliberate food and feed adulteration for economic gain leading to outbreaks and recalls. Understanding the origins and pathways of contamination is imperative for prevention and mitigation. In this paper, we introduce FORCE Foodborne disease Outbreak and ReCall Event extraction from openweb). Our proposed model leverages a multi-tasking sequence labeling architecture in conjunction with transformer-based document embeddings. We have compiled a substantial annotated corpus comprising relevant articles published between 2011 and 2023 to train and evaluate the model. The dataset will be publicly released with the paper. The event detection model demonstrates fair accuracy in identifying food-related incidents and outbreaks associated with organizations, as assessed through cross-validation techniques.

pdf bib abs

This paper provides an overview of Task 2 from the Social Media Mining for Health 2024 shared task (#SMM4H 2024), which focused on Named Entity Recognition (NER, Subtask 2a) and the joint task of NER and Relation Extraction (RE, Subtask 2b) for detecting adverse drug reactions (ADRs) in German, Japanese, and French texts written by patients. Participants were challenged with a few-shot learning scenario, necessitating models that can effectively generalize from limited annotated examples. Despite the diverse strategies employed by the participants, the overall performance across submissions from three teams highlighted significant challenges. The results underscored the complexity of extracting entities and relations in multi-lingual contexts, especially from the noisy and informal nature of user-generated content. Further research is required to develop robust systems capable of accurately identifying and associating ADR-related information in low-resource and multilingual settings.

For the past nine years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in publicly available user-generated content. This year, #SMM4H included seven shared tasks in English, Japanese, German, French, and Spanish from Twitter, Reddit, and health forums. A total of 84 teams from 22 countries registered for #SMM4H, and 45 teams participated in at least one task. This represents a growth of 180% and 160% in registration and participation, respectively, compared to the last iteration. This paper provides an overview of the tasks and participating systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.