Word Equations: Inherently Interpretable Sparse Word Embeddings through Sparse Coding

Adly Templeton

doi:10.18653/v1/2021.blackboxnlp-1.12

Word Equations: Inherently Interpretable Sparse Word Embeddings through Sparse Coding

Abstract

Word embeddings are a powerful natural language processing technique, but they are extremely difficult to interpret. To enable interpretable NLP models, we create vectors where each dimension is inherently interpretable. By inherently interpretable, we mean a system where each dimension is associated with some human-understandable hint that can describe the meaning of that dimension. In order to create more interpretable word embeddings, we transform pretrained dense word embeddings into sparse embeddings. These new embeddings are inherently interpretable: each of their dimensions is created from and represents a natural language word or specific grammatical concept. We construct these embeddings through sparse coding, where each vector in the basis set is itself a word embedding. Therefore, each dimension of our sparse vectors corresponds to a natural language word. We also show that models trained using these sparse embeddings can achieve good performance and are more interpretable in practice, including through human evaluations.

Anthology ID:: 2021.blackboxnlp-1.12
Volume:: Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Jasmijn Bastings, Yonatan Belinkov, Emmanuel Dupoux, Mario Giulianelli, Dieuwke Hupkes, Yuval Pinter, Hassan Sajjad
Venue:: BlackboxNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 177–191
Language:
URL:: https://aclanthology.org/2021.blackboxnlp-1.12/
DOI:: 10.18653/v1/2021.blackboxnlp-1.12
Bibkey:
Cite (ACL):: Adly Templeton. 2021. Word Equations: Inherently Interpretable Sparse Word Embeddings through Sparse Coding. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 177–191, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Word Equations: Inherently Interpretable Sparse Word Embeddings through Sparse Coding (Templeton, BlackboxNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.blackboxnlp-1.12.pdf

PDF Cite Search Fix data