Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation

Dongkyu Lee, Ka Chun Cheung, Nevin Zhang


Abstract
Overconfidence has been shown to impair generalization and calibration of a neural network. Previous studies remedy this issue by adding a regularization term to a loss function, preventing a model from making a peaked distribution. Label smoothing smoothes target labels with a pre-defined prior label distribution; as a result, a model is learned to maximize the likelihood of predicting the soft label. Nonetheless, the amount of smoothing is the same in all samples and remains fixed in training. In other words, label smoothing does not reflect the change in probability distribution mapped by a model over the course of training. To address this issue, we propose a regularization scheme that brings dynamic nature into the smoothing parameter by taking model probability distribution into account, thereby varying the parameter per instance. A model in training self-regulates the extent of smoothing on the fly during forward propagation. Furthermore, inspired by recent work in bridging label smoothing and knowledge distillation, our work utilizes self-knowledge as a prior label distribution in softening target labels, and presents theoretical support for the regularization effect by knowledge distillation and the dynamic smoothing parameter. Our regularizer is validated comprehensively, and the result illustrates marked improvements in model generalization and calibration, enhancing robustness and trustworthiness of a model.
Anthology ID:
2022.emnlp-main.664
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9781–9792
Language:
URL:
https://aclanthology.org/2022.emnlp-main.664
DOI:
10.18653/v1/2022.emnlp-main.664
Bibkey:
Cite (ACL):
Dongkyu Lee, Ka Chun Cheung, and Nevin Zhang. 2022. Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9781–9792, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation (Lee et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.664.pdf