Don’t waste a single annotation: improving single-label classifiers through soft labels

Ben Wu, Yue Li, Yida Mu, Carolina Scarton, Kalina Bontcheva, Xingyi Song


Abstract
In this paper, we address the limitations of the common data annotation and training methods for objective single-label classification tasks. Typically, when annotating such tasks annotators are only asked to provide a single label for each sample and annotator disagreement is discarded when a final hard label is decided through majority voting. We challenge this traditional approach, acknowledging that determining the appropriate label can be difficult due to the ambiguity and lack of context in the data samples. Rather than discarding the information from such ambiguous annotations, our soft label method makes use of them for training. Our findings indicate that additional annotator information, such as confidence, secondary label and disagreement, can be used to effectively generate soft labels. Training classifiers with these soft labels then leads to improved performance and calibration on the hard label test set.
Anthology ID:
2023.findings-emnlp.355
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5347–5355
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.355
DOI:
10.18653/v1/2023.findings-emnlp.355
Bibkey:
Cite (ACL):
Ben Wu, Yue Li, Yida Mu, Carolina Scarton, Kalina Bontcheva, and Xingyi Song. 2023. Don’t waste a single annotation: improving single-label classifiers through soft labels. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5347–5355, Singapore. Association for Computational Linguistics.
Cite (Informal):
Don’t waste a single annotation: improving single-label classifiers through soft labels (Wu et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.355.pdf