Unsupervised Partial Sentence Matching for Cited Text Identification

Kathryn Ricci, Haw-Shiuan Chang, Purujit Goyal, Andrew McCallum


Abstract
Given a citation in the body of a research paper, cited text identification aims to find the sentences in the cited paper that are most relevant to the citing sentence. The task is fundamentally one of sentence matching, where affinity is often assessed by a cosine similarity between sentence embeddings. However, (a) sentences may not be well-represented by a single embedding because they contain multiple distinct semantic aspects, and (b) good matches may not require a strong match in all aspects. To overcome these limitations, we propose a simple and efficient unsupervised method for cited text identification that adapts an asymmetric similarity measure to allow partial matches of multiple aspects in both sentences. On the CL-SciSumm dataset we find that our method outperforms a baseline symmetric approach, and, surprisingly, also outperforms all supervised and unsupervised systems submitted to past editions of CL-SciSumm Shared Task 1a.
Anthology ID:
2022.sdp-1.11
Volume:
Proceedings of the Third Workshop on Scholarly Document Processing
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Arman Cohan, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Drahomira Herrmannova, Petr Knoth, Kyle Lo, Philipp Mayr, Michal Shmueli-Scheuer, Anita de Waard, Lucy Lu Wang
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
95–104
Language:
URL:
https://aclanthology.org/2022.sdp-1.11
DOI:
Bibkey:
Cite (ACL):
Kathryn Ricci, Haw-Shiuan Chang, Purujit Goyal, and Andrew McCallum. 2022. Unsupervised Partial Sentence Matching for Cited Text Identification. In Proceedings of the Third Workshop on Scholarly Document Processing, pages 95–104, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Partial Sentence Matching for Cited Text Identification (Ricci et al., sdp 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sdp-1.11.pdf