CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost

Bo Dong; Yiyi Wang; Hanbo Sun; Yunji Wang; Alireza Hashemi; Zheng Du

doi:10.18653/v1/2022.ecnlp-1.5

CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost

Bo Dong, Yiyi Wang, Hanbo Sun, Yunji Wang, Alireza Hashemi, Zheng Du

Abstract

Deep neural network models are especially susceptible to noise in annotated labels. In the real world, annotated data typically contains noise caused by a variety of factors such as task difficulty, annotator experience, and annotator bias. Label quality is critical for label validation tasks; however, correcting for noise by collecting more data is often costly. In this paper, we propose a contrastive meta-learning framework (CML) to address the challenges introduced by noisy annotated data, specifically in the context of natural language processing. CML combines contrastive and meta learning to improve the quality of text feature representations. Meta-learning is also used to generate confidence scores to assess label quality. We demonstrate that a model built on CML-filtered data outperforms a model built on clean data. Furthermore, we perform experiments on deidentified commercial voice assistant datasets and demonstrate that our model outperforms several SOTA approaches.

Anthology ID:: 2022.ecnlp-1.5
Volume:: Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Shervin Malmasi, Oleg Rokhlenko, Nicola Ueffing, Ido Guy, Eugene Agichtein, Surya Kallumadi
Venue:: ECNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35–43
Language:
URL:: https://aclanthology.org/2022.ecnlp-1.5
DOI:: 10.18653/v1/2022.ecnlp-1.5
Bibkey:
Cite (ACL):: Bo Dong, Yiyi Wang, Hanbo Sun, Yunji Wang, Alireza Hashemi, and Zheng Du. 2022. CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost. In Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5), pages 35–43, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost (Dong et al., ECNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.ecnlp-1.5.pdf
Video:: https://aclanthology.org/2022.ecnlp-1.5.mp4

PDF Cite Search Video