KUL@SMM4H2024: Optimizing Text Classification with Quality-Assured Augmentation Strategies

Sumam Francis, Marie-Francine Moens


Abstract
This paper presents our models for the Social Media Mining for Health 2024 shared task, specifically Task 5, which involves classifying tweets reporting a child with childhood disorders (annotated as “1”) versus those merely mentioning a disorder (annotated as “0”). We utilized a classification model enhanced with diverse textual and language model-based augmentations. To ensure quality, we used semantic similarity, perplexity, and lexical diversity as evaluation metrics. Combining supervised contrastive learning and cross-entropy-based learning, our best model, incorporating R-drop and various LM generation-based augmentations, achieved an impressive F1 score of 0.9230 on the test set, surpassing the task mean and median scores.
Anthology ID:
2024.smm4h-1.33
Volume:
Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Dongfang Xu, Graciela Gonzalez-Hernandez
Venues:
SMM4H | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
142–145
Language:
URL:
https://aclanthology.org/2024.smm4h-1.33
DOI:
Bibkey:
Cite (ACL):
Sumam Francis and Marie-Francine Moens. 2024. KUL@SMM4H2024: Optimizing Text Classification with Quality-Assured Augmentation Strategies. In Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks, pages 142–145, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
KUL@SMM4H2024: Optimizing Text Classification with Quality-Assured Augmentation Strategies (Francis & Moens, SMM4H-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.smm4h-1.33.pdf