Noise Correction on Subjective Datasets

Uthman Jinadu, Yi Ding


Abstract
Incorporating every annotator’s perspective is crucial for unbiased data modeling. Annotator fatigue and changing opinions over time can distort dataset annotations. To combat this, we propose to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction. We show that using our novel formulation, we can cleanly separate agreeing and disagreeing annotations. Furthermore, this method provides a controllable way to encourage or discourage disagreement. We demonstrate that this modification can improve prediction performance in a single or multi-annotator setting. Lastly, we show that this method remains robust to additional label noise that is applied to subjective data.
Anthology ID:
2024.luhme-long.294
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5385–5395
Language:
URL:
https://aclanthology.org/2024.luhme-long.294/
DOI:
10.18653/v1/2024.acl-long.294
Bibkey:
Cite (ACL):
Uthman Jinadu and Yi Ding. 2024. Noise Correction on Subjective Datasets. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5385–5395, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Noise Correction on Subjective Datasets (Jinadu & Ding, ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.294.pdf