STGN: an Implicit Regularization Method for Learning with Noisy Labels in Natural Language Processing

Tingting Wu, Xiao Ding, Minji Tang, Hao Zhang, Bing Qin, Ting Liu


Abstract
Noisy labels are ubiquitous in natural language processing (NLP) tasks. Existing work, namely learning with noisy labels in NLP, is often limited to dedicated tasks or specific training procedures, making it hard to be widely used. To address this issue, SGD noise has been explored to provide a more general way to alleviate the effect of noisy labels by involving benign noise in the process of stochastic gradient descent. However, previous studies exert identical perturbation for all samples, which may cause overfitting on incorrect ones or optimizing correct ones inadequately. To facilitate this, we propose a novel stochastic tailor-made gradient noise (STGN), mitigating the effect of inherent label noise by introducing tailor-made benign noise for each sample. Specifically, we investigate multiple principles to precisely and stably discriminate correct samples from incorrect ones and thus apply different intensities of perturbation to them. A detailed theoretical analysis shows that STGN has good properties, beneficial for model generalization. Experiments on three different NLP tasks demonstrate the effectiveness and versatility of STGN. Also, STGN can boost existing robust training methods.
Anthology ID:
2022.emnlp-main.515
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7587–7598
Language:
URL:
https://aclanthology.org/2022.emnlp-main.515
DOI:
10.18653/v1/2022.emnlp-main.515
Bibkey:
Cite (ACL):
Tingting Wu, Xiao Ding, Minji Tang, Hao Zhang, Bing Qin, and Ting Liu. 2022. STGN: an Implicit Regularization Method for Learning with Noisy Labels in Natural Language Processing. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7587–7598, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
STGN: an Implicit Regularization Method for Learning with Noisy Labels in Natural Language Processing (Wu et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.515.pdf