Multi-objective Representation Learning for Scientific Document Retrieval

Mathias Parisot, Jakub Zavrel


Abstract
Existing dense retrieval models for scientific documents have been optimized for either retrieval by short queries, or for document similarity, but usually not for both. In this paper, we explore the space of combining multiple objectives to achieve a single representation model that presents a good balance between both modes of dense retrieval, combining the relevance judgements from MS MARCO with the citation similarity of SPECTER, and the self-supervised objective of independent cropping. We also consider the addition of training data from document co-citation in a sentence context and domain-specific synthetic data. We show that combining multiple objectives yields models that generalize well across different benchmark tasks, improving up to 73% over models trained on a single objective.
Anthology ID:
2022.sdp-1.9
Volume:
Proceedings of the Third Workshop on Scholarly Document Processing
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Arman Cohan, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Drahomira Herrmannova, Petr Knoth, Kyle Lo, Philipp Mayr, Michal Shmueli-Scheuer, Anita de Waard, Lucy Lu Wang
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
80–88
Language:
URL:
https://aclanthology.org/2022.sdp-1.9
DOI:
Bibkey:
Cite (ACL):
Mathias Parisot and Jakub Zavrel. 2022. Multi-objective Representation Learning for Scientific Document Retrieval. In Proceedings of the Third Workshop on Scholarly Document Processing, pages 80–88, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Multi-objective Representation Learning for Scientific Document Retrieval (Parisot & Zavrel, sdp 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sdp-1.9.pdf
Code
 zetaalphavector/multi-obj-repr-learning
Data
BEIRMS MARCOSciDocsSciFact