Kemalcan Bora
2025
The Impact of Named Entity Recognition on Transformer-Based Multi-Label Dietary Recipe Classification
Kemalcan Bora
|
Horacio Saggion
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
This research explores the impact of Named Entity Recognition (NER) on transformer-based models for multi-label recipe classification by dietary preference. To support this task, we introduce the NutriCuisine Index: a collection of 23,932 recipes annotated across six dietary categories (Healthy, Vegan, Gluten-Free, Low-Carb, High-Protein, Low-Sugar). Using BERT-base-uncased, RoBERTa-base, and DistilBERT-base-uncased, we evaluate how NER-based preprocessing affects the performance (F1-score, Precision, Recall, and Hamming Loss) of Transformer-based multi-label classification models. RoBERTa-base shows significant improvements with NER in F1-score (∆F1 = +0.0147, p < 0.001), Precision, and Recall, while BERT and DistilBERT show no such gains. NER also leads to a slight but statistically significant increase in Hamming Loss across all models. These findings highlight the model dependent impact of NER on classification performance.