Unveiling Semantic Information in Sentence Embeddings

Leixin Zhang, David Burian, Vojtěch John, Ondřej Bojar


Abstract
This study evaluates the extent to which semantic information is preserved within sentence embeddings generated from state-of-art sentence embedding models: SBERT and LaBSE. Specifically, we analyzed 13 semantic attributes in sentence embeddings. Our findings indicate that some semantic features (such as tense-related classes) can be decoded from the representation of sentence embeddings. Additionally, we discover the limitation of the current sentence embedding models: inferring meaning beyond the lexical level has proven to be difficult.
Anthology ID:
2024.dmr-1.5
Volume:
Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Claire Bonial, Julia Bonn, Jena D. Hwang
Venues:
DMR | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
39–47
Language:
URL:
https://aclanthology.org/2024.dmr-1.5
DOI:
Bibkey:
Cite (ACL):
Leixin Zhang, David Burian, Vojtěch John, and Ondřej Bojar. 2024. Unveiling Semantic Information in Sentence Embeddings. In Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024, pages 39–47, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Unveiling Semantic Information in Sentence Embeddings (Zhang et al., DMR-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.dmr-1.5.pdf