Label Smoothing for Text Mining

Peiyang Liu, Xiangyu Xi, Wei Ye, Shikun Zhang


Abstract
Current text mining models are trained with 0-1 hard label that indicates whether an instance belongs to a class, ignoring rich information of the relevance degree. Soft label, which involved each label of varying degrees than the hard label, is considered more suitable for describing instances. The process of generating soft labels from hard labels is defined as label smoothing (LS). Classical LS methods focus on universal data mining tasks so that they ignore the valuable text features in text mining tasks. This paper presents a novel keyword-based LS method to automatically generate soft labels from hard labels via exploiting the relevance between labels and text instances. Generated soft labels are then incorporated into existing models as auxiliary targets during the training stage, capable of improving models without adding any extra parameters. Results of extensive experiments on text classification and large-scale text retrieval datasets demonstrate that soft labels generated by our method contain rich knowledge of text features, improving the performance of corresponding models under both balanced and unbalanced settings.
Anthology ID:
2022.coling-1.193
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2210–2219
Language:
URL:
https://aclanthology.org/2022.coling-1.193
DOI:
Bibkey:
Cite (ACL):
Peiyang Liu, Xiangyu Xi, Wei Ye, and Shikun Zhang. 2022. Label Smoothing for Text Mining. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2210–2219, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Label Smoothing for Text Mining (Liu et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.193.pdf
Data
AG NewsIMDb Movie Reviews