Beyond Knowledge Graphs: PubMedBERT Embeddings as a Competitive Standalone Modality for Drug Re-purposing

Rishik Kondadadi; John E. Ortega

Beyond Knowledge Graphs: PubMedBERT Embeddings as a Competitive Standalone Modality for Drug Re-purposing

Abstract

Drug repurposing methods rely heavily on knowledge graph (KG) embeddings, but building and curating these graphs takes considerable effort. We present two findings on the Hetionet drug-disease benchmark and an epilepsy ranking task. First, PubMedBERT text embeddings, fed through the same downstream classifiers and identical 10-fold splits as four re-trained KG baselines (TransE, ComplEx, DistMult, RotatE), reach AUROC $0.910$, above all four (best: RotatE, $0.854$); a Random Forest on the same vectors scores $0.880$. The comparison is asymmetric in one important way: PubMedBERT was pretrained on the literature Hetionet was curated from, so the result is best read as “text-with-literature-supervision vs.graph-only,” and a head-to-head with text-augmented KG methods (KG-BERT, TxGNN) is left as follow-up. Second, across all seven combinations of text, molecular (ECFP4), and gene expression (LINCS L1000) features, cross-attention fusion of weaker modalities into text consistently degrades performance, despite a gated mechanism intended to suppress unhelpful modalities; the residual path forces the strong modality to absorb noise. The model also ranks proconvulsants (amoxapine, flumazenil) near the top, because text embeddings encode strength of association with a disease but not its direction.

Anthology ID:: 2026.bionlp-1.13
Volume:: BioNLP 2026
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 135–140
Language:
URL:: https://aclanthology.org/2026.bionlp-1.13/
DOI:
Bibkey:
Cite (ACL):: Rishik Kondadadi and John E. Ortega. 2026. Beyond Knowledge Graphs: PubMedBERT Embeddings as a Competitive Standalone Modality for Drug Re-purposing. In BioNLP 2026, pages 135–140, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: Beyond Knowledge Graphs: PubMedBERT Embeddings as a Competitive Standalone Modality for Drug Re-purposing (Kondadadi & Ortega, BioNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.bionlp-1.13.pdf

PDF Cite Search Fix data