Beyond Retrieval: Topic-based Alignment of Scientific Papers to Research Proposal

Rudra Palit, Manasi Patwardhan, Lovekesh Vig, Gautam Shroff


Abstract
The inception of a research agenda typically commences with the creation of a comprehensive research proposal. The efficacy of the proposal often hinges on its ability to connect with the existing scientific literature that supports its ideas. To effectively assess the relevance of existing articles to a research proposal, it is imperative to categorize these articles into high-level thematic groups, referred to as topics, that align with the proposal. This paper introduces a novel task of aligning scientific articles, relevant to a proposal, with researcher-provided proposal topics. Additionally, we construct a dataset to serve as a benchmark for this task. We establish human and Large Language Model (LLM) baselines and propose a novel three-stage approach to address this challenge. We synthesize and use pseudo-labels that map proposal topics to text spans from cited articles to train Language Models (LMs) for two purposes: (i) as a retriever, to extract relevant text spans from cited articles for each topic, and (ii) as a classifier, to categorize the articles into the proposal topics. Our retriever-classifier pipeline, which employs very small open-source LMs fine-tuned with our constructed dataset, achieves results comparable to a vanilla paid LLM-based classifier, demonstrating its efficacy. However, a notable gap of 23.57 F1 score between our approach and the human baseline highlights the complexity of this task and emphasizes the need for further research.
Anthology ID:
2024.sdp-1.7
Volume:
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Tirthankar Ghosal, Amanpreet Singh, Anita Waard, Philipp Mayr, Aakanksha Naik, Orion Weller, Yoonjoo Lee, Shannon Shen, Yanxia Qin
Venues:
sdp | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
70–83
Language:
URL:
https://aclanthology.org/2024.sdp-1.7
DOI:
Bibkey:
Cite (ACL):
Rudra Palit, Manasi Patwardhan, Lovekesh Vig, and Gautam Shroff. 2024. Beyond Retrieval: Topic-based Alignment of Scientific Papers to Research Proposal. In Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024), pages 70–83, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Beyond Retrieval: Topic-based Alignment of Scientific Papers to Research Proposal (Palit et al., sdp-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sdp-1.7.pdf