Learning from Negative Samples in Biomedical Generative Entity Linking

Chanhwi Kim; Hyunjae Kim; Sihyeon Park; Jiwoo Lee; Mujeen Sung; Jaewoo Kang

doi:10.18653/v1/2025.findings-acl.558

Learning from Negative Samples in Biomedical Generative Entity Linking

Chanhwi Kim, Hyunjae Kim, Sihyeon Park, Jiwoo Lee, Mujeen Sung, Jaewoo Kang

Abstract

Generative models have become widely used in biomedical entity linking (BioEL) due to their excellent performance and efficient memory usage. However, these models are usually trained only with positive samples—entities that match the input mention’s identifier—and do not explicitly learn from hard negative samples, which are entities that look similar but have different meanings. To address this limitation, we introduce ANGEL (Learning from Negative Samples in Biomedical Generative Entity Linking), the first framework that trains generative BioEL models using negative samples. Specifically, a generative model is initially trained to generate positive entity names from the knowledge base for given input entities. Subsequently, both correct and incorrect outputs are gathered from the model’s top-k predictions. Finally, the model is updated to prioritize the correct predictions through preference optimization. Our models fine-tuned with ANGEL outperform the previous best baseline models by up to an average top-1 accuracy of 1.4% on five benchmarks. When incorporating our framework into pre-training, the performance improvement increases further to 1.7%, demonstrating its effectiveness in both the pre-training and fine-tuning stages. The code and model weights are available at https://github.com/dmis-lab/ANGEL.

Anthology ID:: 2025.findings-acl.558
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10714–10730
Language:
URL:: https://aclanthology.org/2025.findings-acl.558/
DOI:: 10.18653/v1/2025.findings-acl.558
Bibkey:
Cite (ACL):: Chanhwi Kim, Hyunjae Kim, Sihyeon Park, Jiwoo Lee, Mujeen Sung, and Jaewoo Kang. 2025. Learning from Negative Samples in Biomedical Generative Entity Linking. In Findings of the Association for Computational Linguistics: ACL 2025, pages 10714–10730, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Learning from Negative Samples in Biomedical Generative Entity Linking (Kim et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.558.pdf

PDF Cite Search Fix data