Are Embedding Spaces Interpretable? Results of an Intrusion Detection Evaluation on a Large French Corpus

Thibault Prouteau, Nicolas Dugué, Nathalie Camelin, Sylvain Meignier


Abstract
Word embedding methods allow to represent words as vectors in a space that is structured using word co-occurrences so that words with close meanings are close in this space. These vectors are then provided as input to automatic systems to solve natural language processing problems. Because interpretability is a necessary condition to trusting such systems, interpretability of embedding spaces, the first link in the chain is an important issue. In this paper, we thus evaluate the interpretability of vectors extracted with two approaches: SPINE a k-sparse auto-encoder, and SINr, a graph-based method. This evaluation is based on a Word Intrusion Task with human annotators. It is operated using a large French corpus, and is thus, as far as we know, the first large-scale experiment regarding word embedding interpretability on this language. Furthermore, contrary to the approaches adopted in the literature where the evaluation is done on a small sample of frequent words, we consider a more realistic use-case where most of the vocabulary is kept for the evaluation. This allows to show how difficult this task is, even though SPINE and SINr show some promising results. In particular, SINr results are obtained with a very low amount of computation compared to SPINE, while being similarly interpretable.
Anthology ID:
2022.lrec-1.469
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4414–4419
Language:
URL:
https://aclanthology.org/2022.lrec-1.469
DOI:
Bibkey:
Cite (ACL):
Thibault Prouteau, Nicolas Dugué, Nathalie Camelin, and Sylvain Meignier. 2022. Are Embedding Spaces Interpretable? Results of an Intrusion Detection Evaluation on a Large French Corpus. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4414–4419, Marseille, France. European Language Resources Association.
Cite (Informal):
Are Embedding Spaces Interpretable? Results of an Intrusion Detection Evaluation on a Large French Corpus (Prouteau et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.469.pdf