From Semantics to Style: A Cross-Dataset Comparative Framework for Sentence Similarity Predictions

Yusuke Yamauchi; Akiko Aizawa

From Semantics to Style: A Cross-Dataset Comparative Framework for Sentence Similarity Predictions

Abstract

While Semantic Textual Similarity (STS) task serves as a cornerstone embedding task in natural language processing, the definition of similarity is inherently ambiguous and dataset-specific. Comprehensive cross-dataset analysis remains scarce, leaving it uncertain whether language models perceive diverse semantic and stylistic nuances as humans do. To address this, we propose a comparative framework utilizing lightweight poolers on a frozen encoder to conduct a unified analysis across STS, Paraphrase Identification (PI), and Triplet datasets. Experimental results on 21 datasets indicate a high correlation of semantic concepts between STS and PI settings, while highlighting style as a distinct dimension necessitating explicit separation from semantics. Moreover, Procrustes, layer-wise and hierarchical clustering analyses elucidate the varying properties of these concepts and the structural organization of the embedding space. These insights imply that treating semantics and style as separate components in embedding models is crucial for enhancing both interpretability and practical utility.

Anthology ID:: 2026.findings-eacl.95
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1848–1877
Language:
URL:: https://aclanthology.org/2026.findings-eacl.95/
DOI:
Bibkey:
Cite (ACL):: Yusuke Yamauchi and Akiko Aizawa. 2026. From Semantics to Style: A Cross-Dataset Comparative Framework for Sentence Similarity Predictions. In Findings of the Association for Computational Linguistics: EACL 2026, pages 1848–1877, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: From Semantics to Style: A Cross-Dataset Comparative Framework for Sentence Similarity Predictions (Yamauchi & Aizawa, Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-eacl.95.pdf
Checklist:: 2026.findings-eacl.95.checklist.pdf

PDF Cite Search Checklist Fix data