Disentangling lexical and grammatical information in word embeddings

Li Liu, François Lareau


Abstract
To enable finer-grained linguistic analysis, we propose a method for the separation of lexical and grammatical information within contextualized word embeddings. Using CamemBERT embeddings for French, we apply our method to 14,472 inflected word forms extracted from the Lexical Network of French ( LN-fr ), covering 1,468 nouns, 202 adjectives and 299 verbs inflected via 14 distinct grammatical feature values. Our iterative distillation alternates two steps until convergence: (i) estimating lexical or grammatical vectors by averaging the embeddings of words that share the same lexeme or grammatical feature value, and (ii) isolating the complementary component of each word embedding by subtracting the estimated vector. To assess the quality of the decomposition, we measure whether the resulting lexical and grammatical vectors form more compact clusters within their respective groups and whether their sum better reconstructs the original word embeddings. All evaluations rely on L2 distance. The observed improvements in both clustering and reconstruction accuracy demonstrate the effectiveness of our approach.
Anthology ID:
2025.iwcs-main.28
Volume:
Proceedings of the 16th International Conference on Computational Semantics
Month:
September
Year:
2025
Address:
Düsseldorf, Germany
Editors:
Kilian Evang, Laura Kallmeyer, Sylvain Pogodalla
Venue:
IWCS
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
321–330
Language:
URL:
https://aclanthology.org/2025.iwcs-main.28/
DOI:
Bibkey:
Cite (ACL):
Li Liu and François Lareau. 2025. Disentangling lexical and grammatical information in word embeddings. In Proceedings of the 16th International Conference on Computational Semantics, pages 321–330, Düsseldorf, Germany. Association for Computational Linguistics.
Cite (Informal):
Disentangling lexical and grammatical information in word embeddings (Liu & Lareau, IWCS 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.iwcs-main.28.pdf