Matching Varying-Length Texts via Topic-Informed and Decoupled Sentence Embeddings

Xixi Zhou, Chunbin Gu, Xin Jie, Jiajun Bu, Haishuai Wang


Abstract
Measuring semantic similarity between texts is a crucial task in natural language processing. While existing semantic text matching focuses on pairs of similar-length sequences, matching texts with non-comparable lengths has broader applications in specific domains, such as comparing professional document summaries and content. Current approaches struggle with text pairs of non-comparable lengths due to truncation issues. To address this, we split texts into natural sentences and decouple sentence representations using supervised contrastive learning (SCL). Meanwhile, we adopt the embedded topic model (ETM) for specific domain data. Our experiments demonstrate the effectiveness of our model, based on decoupled and topic-informed sentence embeddings, in matching texts of significantly different lengths across three well-studied datasets.
Anthology ID:
2024.findings-naacl.81
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1274–1280
Language:
URL:
https://aclanthology.org/2024.findings-naacl.81
DOI:
10.18653/v1/2024.findings-naacl.81
Bibkey:
Cite (ACL):
Xixi Zhou, Chunbin Gu, Xin Jie, Jiajun Bu, and Haishuai Wang. 2024. Matching Varying-Length Texts via Topic-Informed and Decoupled Sentence Embeddings. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 1274–1280, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Matching Varying-Length Texts via Topic-Informed and Decoupled Sentence Embeddings (Zhou et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.81.pdf