IITRoorkee@SMM4H 2024 Cross-Platform Age Detection in Twitter and Reddit Using Transformer-Based Model

Thadavarthi Sankar, Dudekula Suraj, Mallamgari Reddy, Durga Toshniwal, Amit Agarwal


Abstract
This paper outlines the methodology for the automatic extraction of self-reported ages from social media posts as part of the Social Media Mining for Health (SMM4H) 2024 Workshop Shared Tasks. The focus was on Task 6: “Self-reported exact age classification with cross-platform evaluation in English.” The goal was to accurately identify age-related information from user-generated content, which is crucial for applications in public health monitoring, targeted advertising, and demographic research. A number of transformer-based models were employed, including RoBERTa-Base, BERT-Base, BiLSTM, and Flan T5 Base, leveraging their advanced capabilities in natural language understanding. The training strategies included fine-tuning foundational pre-trained language models and evaluating model performance using standard metrics: F1-score, Precision, and Recall. The experimental results demonstrated that the RoBERTa-Base model significantly outperformed the other models in this classification task. The best results achieved with the RoBERTa-Base model were an F1-score of 0.878, a Precision of 0.899, and a Recall of 0.858.
Anthology ID:
2024.smm4h-1.23
Volume:
Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Dongfang Xu, Graciela Gonzalez-Hernandez
Venues:
SMM4H | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
101–105
Language:
URL:
https://aclanthology.org/2024.smm4h-1.23
DOI:
Bibkey:
Cite (ACL):
Thadavarthi Sankar, Dudekula Suraj, Mallamgari Reddy, Durga Toshniwal, and Amit Agarwal. 2024. IITRoorkee@SMM4H 2024 Cross-Platform Age Detection in Twitter and Reddit Using Transformer-Based Model. In Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks, pages 101–105, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
IITRoorkee@SMM4H 2024 Cross-Platform Age Detection in Twitter and Reddit Using Transformer-Based Model (Sankar et al., SMM4H-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.smm4h-1.23.pdf