BioSimCSE: BioMedical Sentence Embeddings using Contrastive learning

Kamal Raj Kanakarajan; Bhuvana Kundumani; Abhijith Abraham; Malaikannan Sankarasubbu

doi:10.18653/v1/2022.louhi-1.10

BioSimCSE: BioMedical Sentence Embeddings using Contrastive learning

Kamal raj Kanakarajan, Bhuvana Kundumani, Abhijith Abraham, Malaikannan Sankarasubbu

Abstract

Sentence embeddings in the form of fixed-size vectors that capture the information in the sentence as well as the context are critical components of Natural Language Processing systems. With transformer model based sentence encoders outperforming the other sentence embedding methods in the general domain, we explore the transformer based architectures to generate dense sentence embeddings in the biomedical domain. In this work, we present BioSimCSE, where we train sentence embeddings with domain specific transformer based models with biomedical texts. We assess our model’s performance with zero-shot and fine-tuned settings on Semantic Textual Similarity (STS) and Recognizing Question Entailment (RQE) tasks. Our BioSimCSE model using BioLinkBERT achieves state of the art (SOTA) performance on both tasks.

Anthology ID:: 2022.louhi-1.10
Volume:: Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Alberto Lavelli, Eben Holderness, Antonio Jimeno Yepes, Anne-Lyse Minard, James Pustejovsky, Fabio Rinaldi
Venue:: Louhi
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 81–86
Language:
URL:: https://aclanthology.org/2022.louhi-1.10
DOI:: 10.18653/v1/2022.louhi-1.10
Bibkey:
Cite (ACL):: Kamal raj Kanakarajan, Bhuvana Kundumani, Abhijith Abraham, and Malaikannan Sankarasubbu. 2022. BioSimCSE: BioMedical Sentence Embeddings using Contrastive learning. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI), pages 81–86, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: BioSimCSE: BioMedical Sentence Embeddings using Contrastive learning (Kanakarajan et al., Louhi 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.louhi-1.10.pdf

PDF Cite Search