Abhijith Abraham


2022

pdf bib
BioSimCSE: BioMedical Sentence Embeddings using Contrastive learning
Kamal raj Kanakarajan | Bhuvana Kundumani | Abhijith Abraham | Malaikannan Sankarasubbu
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)

Sentence embeddings in the form of fixed-size vectors that capture the information in the sentence as well as the context are critical components of Natural Language Processing systems. With transformer model based sentence encoders outperforming the other sentence embedding methods in the general domain, we explore the transformer based architectures to generate dense sentence embeddings in the biomedical domain. In this work, we present BioSimCSE, where we train sentence embeddings with domain specific transformer based models with biomedical texts. We assess our model’s performance with zero-shot and fine-tuned settings on Semantic Textual Similarity (STS) and Recognizing Question Entailment (RQE) tasks. Our BioSimCSE model using BioLinkBERT achieves state of the art (SOTA) performance on both tasks.