WeaNF”:" Weak Supervision with Normalizing Flows

Andreas Stephan, Benjamin Roth


Abstract
A popular approach to decrease the need for costly manual annotation of large data sets is weak supervision, which introduces problems of noisy labels, coverage and bias. Methods for overcoming these problems have either relied on discriminative models, trained with cost functions specific to weak supervision, and more recently, generative models, trying to model the output of the automatic annotation process. In this work, we explore a novel direction of generative modeling for weak supervision”:” Instead of modeling the output of the annotation process (the labeling function matches), we generatively model the input-side data distributions (the feature space) covered by labeling functions. Specifically, we estimate a density for each weak labeling source, or labeling function, by using normalizing flows. An integral part of our method is the flow-based modeling of multiple simultaneously matching labeling functions, and therefore phenomena such as labeling function overlap and correlations are captured. We analyze the effectiveness and modeling capabilities on various commonly used weak supervision data sets, and show that weakly supervised normalizing flows compare favorably to standard weak supervision baselines.
Anthology ID:
2022.repl4nlp-1.27
Volume:
Proceedings of the 7th Workshop on Representation Learning for NLP
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Spandana Gella, He He, Bodhisattwa Prasad Majumder, Burcu Can, Eleonora Giunchiglia, Samuel Cahyawijaya, Sewon Min, Maximilian Mozes, Xiang Lorraine Li, Isabelle Augenstein, Anna Rogers, Kyunghyun Cho, Edward Grefenstette, Laura Rimell, Chris Dyer
Venue:
RepL4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
269–279
Language:
URL:
https://aclanthology.org/2022.repl4nlp-1.27
DOI:
10.18653/v1/2022.repl4nlp-1.27
Bibkey:
Cite (ACL):
Andreas Stephan and Benjamin Roth. 2022. WeaNF”:" Weak Supervision with Normalizing Flows. In Proceedings of the 7th Workshop on Representation Learning for NLP, pages 269–279, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
WeaNF”:" Weak Supervision with Normalizing Flows (Stephan & Roth, RepL4NLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.repl4nlp-1.27.pdf
Video:
 https://aclanthology.org/2022.repl4nlp-1.27.mp4
Data
IMDb Movie Reviews