Lina Cao


2024

pdf bib
CTYUN-AI@SMM4H-2024: Knowledge Extension Makes Expert Models
Yuming Fan | Dongming Yang | Lina Cao
Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

This paper explores the potential of social media as a rich source of data for understanding public health trends and behaviors, particularly focusing on emotional well-being and the impact of environmental factors. We employed large language models (LLMs) and developed a suite of knowledge extension techniques to analyze social media content related to mental health issues, specifically examining 1) effects of outdoor spaces on social anxiety symptoms in Reddit,2) tweets reporting children’s medical disorders, and 3) self-reported ages in posts of Twitter and Reddit. Our knowledge extension approach encompasses both supervised data (i.e., sample augmentation and cross-task fine-tuning) and unsupervised data (i.e., knowledge distillation and cross-task pre-training), tackling the inherent challenges of sample imbalance and informality of social media language. The effectiveness of our approach is demonstrated by the superior performance across multiple tasks (i.e., Task 3, 5 and 6) at the SMM4H-2024. Notably, we achieved the best performance in all three tasks, underscoring the utility of our models in real-world applications.

2023

pdf bib
Emotion classification on code-mixed text messages via soft prompt tuning
Jinghui Zhang | Dongming Yang | Siyu Bao | Lina Cao | Shunguo Fan
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Emotion classification on code-mixed text messages is challenging due to the multilingual languages and non-literal cues (i.e., emoticons). To solve these problems, we propose an innovative soft prompt tuning method, which is lightweight and effective to release potential abilities of the pre-trained language models and improve the classification results. Firstly, we transform emoticons into textual information to utilize their rich emotional information. Then, variety of innovative templates and verbalizers are applied to promote emotion classification. Extensive experiments show that transforming emoticons and employing prompt tuning both benefit the performance. Finally, as a part of WASSA 2023, we obtain the accuracy of 0.972 in track MLEC and 0.892 in track MCEC, yielding the second place in both two tracks.