InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings

Xing Wu; Chaochen Gao; Zijia Lin; Jizhong Han; Zhongyuan Wang; Songlin Hu

doi:10.18653/v1/2022.findings-emnlp.223

InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings

Xing Wu, Chaochen Gao, Zijia Lin, Jizhong Han, Zhongyuan Wang, Songlin Hu

Abstract

Contrastive learning has been extensively studied in sentence embedding learning, which assumes that the embeddings of different views of the same sentence are closer. The constraint brought by this assumption is weak, and a good sentence representation should also be able to reconstruct the original sentence fragments. Therefore, this paper proposes an information-aggregated contrastive learning framework for learning unsupervised sentence embeddings, termed InfoCSE.InfoCSE forces the representation of [CLS] positions to aggregate denser sentence information by introducing an additional Masked language model task and a well-designed network. We evaluate the proposed InfoCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task. Experimental results show that InfoCSE outperforms SimCSE by an average Spearman correlation of 2.60% on BERT-base, and 1.77% on BERT-large, achieving state-of-the-art results among unsupervised sentence representation learning methods.

Anthology ID:: 2022.findings-emnlp.223
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3060–3070
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.223/
DOI:: 10.18653/v1/2022.findings-emnlp.223
Bibkey:
Cite (ACL):: Xing Wu, Chaochen Gao, Zijia Lin, Jizhong Han, Zhongyuan Wang, and Songlin Hu. 2022. InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3060–3070, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings (Wu et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-emnlp.223.pdf

PDF Cite Search Fix data