SepLL: Separating Latent Class Labels from Weak Supervision Noise

Andreas Stephan, Vasiliki Kougia, Benjamin Roth


Abstract
In the weakly supervised learning paradigm, labeling functions automatically assign heuristic, often noisy, labels to data samples. In this work, we provide a method for learning from weak labels by separating two types of complementary information associated with the labeling functions: information related to the target label and information specific to one labeling function only. Both types of information are reflected to different degrees by all labeled instances. In contrast to previous works that aimed at correcting or removing wrongly labeled instances, we learn a branched deep model that uses all data as-is, but splits the labeling function information in the latent space. Specifically, we propose the end-to-end model SepLL which extends a transformer classifier by introducing a latent space for labeling function specific and task-specific information. The learning signal is only given by the labeling functions matches, no pre-processing or label model is required for our method. Notably, the task prediction is made from the latent layer without any direct task signal. Experiments on Wrench text classification tasks show that our model is competitive with the state-of-the-art, and yields a new best average performance.
Anthology ID:
2022.findings-emnlp.288
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3918–3929
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.288
DOI:
10.18653/v1/2022.findings-emnlp.288
Bibkey:
Cite (ACL):
Andreas Stephan, Vasiliki Kougia, and Benjamin Roth. 2022. SepLL: Separating Latent Class Labels from Weak Supervision Noise. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3918–3929, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
SepLL: Separating Latent Class Labels from Weak Supervision Noise (Stephan et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-emnlp.288.pdf
Video:
 https://aclanthology.org/2022.findings-emnlp.288.mp4