Augmented Bio-SBERT: Improving Performance for Pairwise Sentence Tasks in Bio-medical Domain
Sonam Pankaj | Amit Gautam
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)
One of the modern challenges in AI is the access to high-quality and annotated data, especially in NLP; that is why augmentation is gaining importance. In computer vision, where image data augmentation is standard, text data augmentation in NLP is complex due to the high complexity of language. Moreover, we have seen the advantages of augmentation where there are fewer data available, which can significantly improve the model’s accuracy and performance. We have implemented Augmentation in Pairwise sentence scoring in the biomedical domain. By experimenting with our approach to downstream tasks on biomedical data, we have looked into the solution to improve Bi-encoders’ sentence transformer performance using an augmented dataset generated by cross-encoders fine-tuned on Biosses and MedNLI on the pre-trained Bio-BERT model. It has significantly improved the results with respect to the model only trained on Gold data for the respective tasks.