Ben Phan


2025

pdf bib
MyMy at SemEval-2025 Task 9: A Robust Knowledge-Augmented Data Approach for Reliable Food Hazard Detection
Ben Phan | Jung-Hsien Chiang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

The Food Hazard Detection (SemEval-2025 Task 9) advances explainable classification of food-incident reports collected from web sources, including social media and regulatory agency websites, to support timely risk mitigation for public health and the economy. This task is complicated by a highly imbalanced, long-tail label distribution and the need for transparent, reliable AI. We present a robust Knowledge-Augmented Data approach that integrates Retrieval-Augmented Generation (RAG) with domain-specific knowledge from the PubMed API to enrich and balance the training data. Our method leverages domain-specific knowledge to expand datasets and curate high-quality data that enhances overall data integrity. We hypothesize that Knowledge-Augmented Data improves Macro-F1 scores, the primary evaluation metric. Our approach achieved a top-2 ranking across both subtasks, demonstrating its effectiveness in advancing NLP applications for food safety and contributing to more reliable food hazard detection.