Yonghe Chu


2021

pdf bib
Locality Preserving Sentence Encoding
Changrong Min | Yonghe Chu | Liang Yang | Bo Xu | Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2021

Although researches on word embeddings have made great progress in recent years, many tasks in natural language processing are on the sentence level. Thus, it is essential to learn sentence embeddings. Recently, Sentence BERT (SBERT) is proposed to learn embeddings on the sentence level, and it uses the inner product (or, cosine similarity) to compute semantic similarity between sentences. However, this measurement cannot well describe the semantic structures among sentences. The reason is that sentences may lie on a manifold in the ambient space rather than distribute in an Euclidean space. Thus, cosine similarity cannot approximate distances on the manifold. To tackle the severe problem, we propose a novel sentence embedding method called Sentence BERT with Locality Preserving (SBERT-LP), which discovers the sentence submanifold from a high-dimensional space and yields a compact sentence representation subspace by locally preserving geometric structures of sentences. We compare the SBERT-LP with several existing sentence embedding approaches from three perspectives: sentence similarity, sentence classification and sentence clustering. Experimental results and case studies demonstrate that our method encodes sentences better in the sense of semantic structures.