AMenDeD: Modelling Concepts by Aligning Mentions, Definitions and Decontextualised Embeddings

Amit Gajbhiye, Zied Bouraoui, Luis Espinosa Anke, Steven Schockaert


Abstract
Contextualised Language Models (LM) improve on traditional word embeddings by encoding the meaning of words in context. However, such models have also made it possible to learn high-quality decontextualised concept embeddings. Three main strategies for learning such embeddings have thus far been considered: (i) fine-tuning the LM to directly predict concept embeddings from the name of the concept itself, (ii) averaging contextualised representations of mentions of the concept in a corpus, and (iii) encoding definitions of the concept. As these strategies have complementary strengths and weaknesses, we propose to learn a unified embedding space in which all three types of representations can be integrated. We show that this allows us to outperform existing approaches in tasks such as ontology completion, which heavily depends on access to high-quality concept embeddings. We furthermore find that mentions and definitions are well-aligned in the resulting space, enabling tasks such as target sense verification, even without the need for any fine-tuning.
Anthology ID:
2024.lrec-main.72
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
801–811
Language:
URL:
https://aclanthology.org/2024.lrec-main.72
DOI:
Bibkey:
Cite (ACL):
Amit Gajbhiye, Zied Bouraoui, Luis Espinosa Anke, and Steven Schockaert. 2024. AMenDeD: Modelling Concepts by Aligning Mentions, Definitions and Decontextualised Embeddings. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 801–811, Torino, Italia. ELRA and ICCL.
Cite (Informal):
AMenDeD: Modelling Concepts by Aligning Mentions, Definitions and Decontextualised Embeddings (Gajbhiye et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.72.pdf