MISTI: Metadata-Informed Scientific Text and Image Representation through Contrastive Learning

Pawin Taechoyotin, Daniel Acuna


Abstract
In scientific publications, automatic representations of figures and their captions can be used in NLP, computer vision, and information retrieval tasks. Contrastive learning has proven effective for creating such joint representations for natural scenes, but its application to scientific imagery and descriptions remains under-explored. Recent open-access publication datasets provide an opportunity to understand the effectiveness of this technique as well as evaluate the usefulness of additional metadata, which are available only in the scientific context. Here, we introduce MISTI, a novel model that uses contrastive learning to simultaneously learn the representation of figures, captions, and metadata, such as a paper’s title, sections, and curated concepts from the PubMed Open Access Subset. We evaluate our model on multiple information retrieval tasks, showing substantial improvements over baseline models. Notably, incorporating metadata doubled retrieval performance, achieving a Recall@1 of 30% on a 70K-item caption retrieval task. We qualitatively explore how metadata can be used to strategically retrieve distinctive representations of the same concept but for different sections, such as introduction and results. Additionally, we show that our model seamlessly handles out-of-domain tasks related to image segmentation. We share our dataset and methods (https://github.com/Khempawin/scientific-image-caption-pair/tree/section-attr) and outline future research directions.
Anthology ID:
2024.sdp-1.15
Volume:
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Tirthankar Ghosal, Amanpreet Singh, Anita Waard, Philipp Mayr, Aakanksha Naik, Orion Weller, Yoonjoo Lee, Shannon Shen, Yanxia Qin
Venues:
sdp | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
155–164
Language:
URL:
https://aclanthology.org/2024.sdp-1.15
DOI:
Bibkey:
Cite (ACL):
Pawin Taechoyotin and Daniel Acuna. 2024. MISTI: Metadata-Informed Scientific Text and Image Representation through Contrastive Learning. In Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024), pages 155–164, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
MISTI: Metadata-Informed Scientific Text and Image Representation through Contrastive Learning (Taechoyotin & Acuna, sdp-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sdp-1.15.pdf