WeLT: Improving Biomedical Fine-tuned Pre-trained Language Models with Cost-sensitive Learning

Ghadeer Mobasher, Wolfgang Müller, Olga Krebs, Michael Gertz


Abstract
Fine-tuning biomedical pre-trained language models (BioPLMs) such as BioBERT has become a common practice dominating leaderboards across various natural language processing tasks. Despite their success and wide adoption, prevailing fine-tuning approaches for named entity recognition (NER) naively train BioPLMs on targeted datasets without considering class distributions. This is problematic especially when dealing with imbalanced biomedical gold-standard datasets for NER in which most biomedical entities are underrepresented. In this paper, we address the class imbalance problem and propose WeLT, a cost-sensitive fine-tuning approach based on new re-scaled class weights for the task of biomedical NER. We evaluate WeLT’s fine-tuning performance on mixed-domain and domain-specific BioPLMs using eight biomedical gold-standard datasets. We compare our approach against vanilla fine-tuning and three other existing re-weighting schemes. Our results show the positive impact of handling the class imbalance problem. WeLT outperforms all the vanilla fine-tuned models. Furthermore, our method demonstrates advantages over other existing weighting schemes in most experiments.
Anthology ID:
2023.bionlp-1.40
Volume:
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Dina Demner-fushman, Sophia Ananiadou, Kevin Cohen
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
427–438
Language:
URL:
https://aclanthology.org/2023.bionlp-1.40
DOI:
10.18653/v1/2023.bionlp-1.40
Bibkey:
Cite (ACL):
Ghadeer Mobasher, Wolfgang Müller, Olga Krebs, and Michael Gertz. 2023. WeLT: Improving Biomedical Fine-tuned Pre-trained Language Models with Cost-sensitive Learning. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 427–438, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
WeLT: Improving Biomedical Fine-tuned Pre-trained Language Models with Cost-sensitive Learning (Mobasher et al., BioNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.bionlp-1.40.pdf
Video:
 https://aclanthology.org/2023.bionlp-1.40.mp4