Ustnlp16 at SemEval-2025 Task 9: Improving Model Performance through Imbalance Handling and Focal Loss

Zhuoang Cai; Zhenghao Li; Yang Liu; Liyuan Guo; Yangqiu Song

Ustnlp16 at SemEval-2025 Task 9: Improving Model Performance through Imbalance Handling and Focal Loss

Zhuoang Cai, Zhenghao Li, Yang Liu, Liyuan Guo, Yangqiu Song

Abstract

Classification tasks often suffer from imbal- anced data distribution, which presents chal- lenges in food hazard detection due to severe class imbalances, short and unstructured text, and overlapping semantic categories. In this paper, we present our system for SemEval- 2025 Task 9: Food Hazard Detection, which ad- dresses these issues by applying data augmenta- tion techniques to improve classification perfor- mance. We utilize transformer-based models, BERT and RoBERTa, as backbone classifiers and explore various data balancing strategies, including random oversampling, Easy Data Augmentation (EDA), and focal loss. Our ex- periments show that EDA effectively mitigates class imbalance, leading to significant improve- ments in accuracy and F1 scores. Furthermore, combining focal loss with oversampling and EDA further enhances model robustness, par- ticularly for hard-to-classify examples. These findings contribute to the development of more effective NLP-based classification models for food hazard detection.

Anthology ID:: 2025.semeval-1.200
Volume:: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1522–1527
Language:
URL:: https://aclanthology.org/2025.semeval-1.200/
DOI:
Bibkey:
Cite (ACL):: Zhuoang Cai, Zhenghao Li, Yang Liu, Liyuan Guo, and Yangqiu Song. 2025. Ustnlp16 at SemEval-2025 Task 9: Improving Model Performance through Imbalance Handling and Focal Loss. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 1522–1527, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Ustnlp16 at SemEval-2025 Task 9: Improving Model Performance through Imbalance Handling and Focal Loss (Cai et al., SemEval 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.semeval-1.200.pdf

PDF Cite Search Fix data