Locality Preserving Sentence Encoding

Changrong Min, Yonghe Chu, Liang Yang, Bo Xu, Hongfei Lin


Abstract
Although researches on word embeddings have made great progress in recent years, many tasks in natural language processing are on the sentence level. Thus, it is essential to learn sentence embeddings. Recently, Sentence BERT (SBERT) is proposed to learn embeddings on the sentence level, and it uses the inner product (or, cosine similarity) to compute semantic similarity between sentences. However, this measurement cannot well describe the semantic structures among sentences. The reason is that sentences may lie on a manifold in the ambient space rather than distribute in an Euclidean space. Thus, cosine similarity cannot approximate distances on the manifold. To tackle the severe problem, we propose a novel sentence embedding method called Sentence BERT with Locality Preserving (SBERT-LP), which discovers the sentence submanifold from a high-dimensional space and yields a compact sentence representation subspace by locally preserving geometric structures of sentences. We compare the SBERT-LP with several existing sentence embedding approaches from three perspectives: sentence similarity, sentence classification and sentence clustering. Experimental results and case studies demonstrate that our method encodes sentences better in the sense of semantic structures.
Anthology ID:
2021.findings-emnlp.262
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3050–3060
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.262
DOI:
10.18653/v1/2021.findings-emnlp.262
Bibkey:
Cite (ACL):
Changrong Min, Yonghe Chu, Liang Yang, Bo Xu, and Hongfei Lin. 2021. Locality Preserving Sentence Encoding. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3050–3060, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Locality Preserving Sentence Encoding (Min et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.262.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.262.mp4
Data
MPQA Opinion CorpusSICKSSTSentEval