Unsupervised Sentence Textual Similarity with Compositional Phrase Semantics

Zihao Wang, Jiaheng Dou, Yong Zhang


Abstract
Measuring Sentence Textual Similarity (STS) is a classic task that can be applied to many downstream NLP applications such as text generation and retrieval. In this paper, we focus on unsupervised STS that works on various domains but only requires minimal data and computational resources. Theoretically, we propose a light-weighted Expectation-Correction (EC) formulation for STS computation. EC formulation unifies unsupervised STS approaches including the cosine similarity of Additively Composed (AC) sentence embeddings, Optimal Transport (OT), and Tree Kernels (TK). Moreover, we propose the Recursive Optimal Transport Similarity (ROTS) algorithm to capture the compositional phrase semantics by composing multiple recursive EC formulations. ROTS finishes in linear time and is faster than its predecessors. ROTS is empirically more effective and scalable than previous approaches. Extensive experiments on 29 STS tasks under various settings show the clear advantage of ROTS over existing approaches. Detailed ablation studies prove the effectiveness of our approaches.
Anthology ID:
2022.coling-1.441
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4976–4995
Language:
URL:
https://aclanthology.org/2022.coling-1.441
DOI:
Bibkey:
Cite (ACL):
Zihao Wang, Jiaheng Dou, and Yong Zhang. 2022. Unsupervised Sentence Textual Similarity with Compositional Phrase Semantics. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4976–4995, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Unsupervised Sentence Textual Similarity with Compositional Phrase Semantics (Wang et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.441.pdf
Code
 zihao-wang/rots