Unsupervised Sentence Representation Learning with Syntactically Aligned Negative Samples

Zhilan Wang, Zekai Zhi, Rize Jin, Kehui Song, He Wang, Da-Jung Cho


Abstract
Sentence representation learning benefits from data augmentation strategies to improve model performance and generalization, yet existing approaches often encounter issues such as semantic inconsistencies and feature suppression. To address these limitations, we propose a method for generating Syntactically Aligned Negative (SAN) samples through a semantic importance-aware Masked Language Model (MLM) approach. Our method quantifies semantic contributions of individual words to produce negative samples that have substantial textual overlap with the original sentences while conveying different meanings. We further introduce Hierarchical-InfoNCE (HiNCE), a novel contrastive learning objective employing differential temperature weighting to optimize the utilization of both in-batch and syntactically aligned negative samples. Extensive evaluations across seven semantic textual similarity benchmarks demonstrate consistent improvements over state-of-the-art models.
Anthology ID:
2025.findings-naacl.461
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8247–8259
Language:
URL:
https://aclanthology.org/2025.findings-naacl.461/
DOI:
Bibkey:
Cite (ACL):
Zhilan Wang, Zekai Zhi, Rize Jin, Kehui Song, He Wang, and Da-Jung Cho. 2025. Unsupervised Sentence Representation Learning with Syntactically Aligned Negative Samples. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 8247–8259, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Sentence Representation Learning with Syntactically Aligned Negative Samples (Wang et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-naacl.461.pdf