Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations

Sihao Chen; Hongming Zhang; Tong Chen; Ben Zhou; Wenhao Yu; Dian Yu; Baolin Peng; Hongwei Wang; Dan Roth; Dong Yu (于东)

doi:10.18653/v1/2024.naacl-long.89

Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations

Sihao Chen, Hongming Zhang, Tong Chen, Ben Zhou, Wenhao Yu, Dian Yu, Baolin Peng, Hongwei Wang, Dan Roth, Dong Yu

Abstract

We introduce sub-sentence encoder, a contrastively-learned contextual embedding model for fine-grained semantic representation of text. In contrast to the standard practice with sentence embeddings, where the meaning of an entire sequence of text is encoded into a fixed-length vector, the sub-sentence encoder learns to produce distinct contextual embeddings corresponding to different atomic propositions, i.e. atomic units of meaning expressed within a text sequence. The sub-sentence embeddings are contrastively learned to recognize (inferred) semantic equivalence between propositions across different text sequences. Our experiments show the effectiveness of sub-sentence encoders in applications, such as retrieving supporting facts for fine-grained text attribution or recognizing the conditional semantic similarity between texts. In practice, we demonstrate that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.

Anthology ID:: 2024.naacl-long.89
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1596–1609
Language:
URL:: https://aclanthology.org/2024.naacl-long.89/
DOI:: 10.18653/v1/2024.naacl-long.89
Bibkey:
Cite (ACL):: Sihao Chen, Hongming Zhang, Tong Chen, Ben Zhou, Wenhao Yu, Dian Yu, Baolin Peng, Hongwei Wang, Dan Roth, and Dong Yu. 2024. Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1596–1609, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations (Chen et al., NAACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.naacl-long.89.pdf
Video:: https://aclanthology.org/2024.naacl-long.89.mp4

PDF Cite Search Video Fix data