Learning on Imbalanced Noisy Data via Debiased Sample Selection and LLM-Driven Annotation

Bo Yuan; Yulin Chen; Yin Zhang

Learning on Imbalanced Noisy Data via Debiased Sample Selection and LLM-Driven Annotation

Abstract

Learning with Noisy Labels (LNL) is a challenge where the collected training set can contain incorrect or corrupted labels. Most existing solutions distinguish clean samples from noisy samples and query human experts on noisy samples for denoising. However, these solutions often operate under the unrealistic assumption that the distribution of classes is uniform, overlooking the skewed and imbalanced distributions frequently encountered in real-world scenarios. In this case, we empirically reveal that previous solutions suffer from both selection bias and training bias, leading to distinguish clean samples from noisy samples hardly. In this paper, our work introduces the imbalanced learning with noisy labels (i-LNL) task, which seeks to let the model learn from noisy labels within imbalanced distributions. A new benchmark (ImbaLNL-Bench) comprised of some synthetic and real-world datasets is created to provide a thorough representation of practical use cases. Besides, we propose an innovative collaborative learning framework DeCo for i-LNL tasks. Specifically, we first conduct debiased sample selection, consisting of a robust expert model and a debiased-enhanced threshold strategy, to better separate clean samples from noisy samples, especially for the tail classes. Then we feed selected clean samples to active annotator large language models (LLMs) for re-annotating noisy samples using in-context learning, which can better reduce human effort. Ultimately, we employ distinct loss functions adept at managing subsets with varying degrees of label noise. Extensive experimental results on synthetic and real-world datasets show the effectiveness and superiority of our method.

Anthology ID:: 2026.findings-acl.1526
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 30504–30542
Language:
URL:: https://aclanthology.org/2026.findings-acl.1526/
DOI:
Bibkey:
Cite (ACL):: Bo Yuan, Yulin Chen, and Yin Zhang. 2026. Learning on Imbalanced Noisy Data via Debiased Sample Selection and LLM-Driven Annotation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 30504–30542, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Learning on Imbalanced Noisy Data via Debiased Sample Selection and LLM-Driven Annotation (Yuan et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1526.pdf
Checklist:: 2026.findings-acl.1526.checklist.pdf

PDF Cite Search Checklist Fix data