Francisco Vargas
2022
Adversarial Concept Erasure in Kernel Space
Shauli Ravfogel
|
Francisco Vargas
|
Yoav Goldberg
|
Ryan Cotterell
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
The representation space of neural models for textual data emerges in an unsupervised manner during training. Understanding how human-interpretable concepts, such as gender, are encoded in these representations would improve the ability of users to control the content of these representations and analyze the working of the models that rely on them. One prominent approach to the control problem is the identification and removal of linear concept subspaces – subspaces in the representation space that correspond to a given concept. While those are tractable and interpretable, neural network do not necessarily represent concepts in linear subspaces. We propose a kernelization of the recently-proposed linear concept-removal objective, and show that it is effective in guarding against the ability of certain nonlinear adversaries to recover the concept. Interestingly, our findings suggest that the division between linear and nonlinear models is overly simplistic: when considering the concept of binary gender and its neutralization, we do not find a single kernel space that exclusively contains all the concept-related information. It is therefore challenging to protect against all nonlinear adversaries at once.
2020
Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation
Francisco Vargas
|
Ryan Cotterell
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Bolukbasi et al. (2016) presents one of the first gender bias mitigation techniques for word embeddings. Their method takes pre-trained word embeddings as input and attempts to isolate a linear subspace that captures most of the gender bias in the embeddings. As judged by an analogical evaluation task, their method virtually eliminates gender bias in the embeddings. However, an implicit and untested assumption of their method is that the bias subspace is actually linear. In this work, we generalize their method to a kernelized, non-linear version. We take inspiration from kernel principal component analysis and derive a non-linear bias isolation technique. We discuss and overcome some of the practical drawbacks of our method for non-linear gender bias mitigation in word embeddings and analyze empirically whether the bias subspace is actually linear. Our analysis shows that gender bias is in fact well captured by a linear subspace, justifying the assumption of Bolukbasi et al. (2016).
2019
Multilingual Factor Analysis
Francisco Vargas
|
Kamen Brestnichki
|
Alex Papadopoulos Korfiatis
|
Nils Hammerla
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
In this work we approach the task of learning multilingual word representations in an offline manner by fitting a generative latent variable model to a multilingual dictionary. We model equivalent words in different languages as different views of the same word generated by a common latent variable representing their latent lexical meaning. We explore the task of alignment by querying the fitted model for multilingual embeddings achieving competitive results across a variety of tasks. The proposed model is robust to noise in the embedding space making it a suitable method for distributed representations learned from noisy corpora.
Search
Fix data
Co-authors
- Ryan Cotterell 2
- Kamen Brestnichki 1
- Yoav Goldberg 1
- Nils Hammerla 1
- Alex Papadopoulos Korfiatis 1
- show all...