Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation

Mohsen Nayebi Kerdabadi; Arya Hadizadeh Moghaddam; Chen Chen; Dongjie Wang; Zijun Yao

Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation

Mohsen Nayebi Kerdabadi, Arya Hadizadeh Moghaddam, Chen Chen, Dongjie Wang, Zijun Yao

Abstract

In electronic health record (EHR) mining, learning high-quality representations of medical concepts (e.g., standardized diagnosis, medication, and procedure codes) is fundamental for downstream clinical prediction. However, robust concept representation learning is hindered by two key challenges: (i) clinically important cross-type dependencies (e.g., diagnosis-medication and medication-procedure relations) are often missing or incomplete in existing ontology resources, limiting the ability to model complex EHR patterns; and (ii) rich clinical semantics are often missing from structured resources, and even when available as text, are difficult to integrate with KG structure for representation learning. To address these challenges, we present MedCo, an LLM-empowered graph learning framework for medical concept representation. MedCo first builds a global knowledge graph (KG) over medical codes by combining statistically reliable associations mined from EHRs with type-constrained LLM prompting to infer semantic relations. It then utilizes LLMs to enrich the KG into a text-attributed graph by generating node descriptions and edge rationales, providing semantic signals for both concepts and their relationships. Finally, MedCo jointly trains a LoRA-tuned LLaMA text encoder with a heterogeneous GNN, fusing text semantics and graph structure into unified concept embeddings. Extensive experiments on MIMIC-III and MIMIC-IV show that MedCo consistently improves prediction performance and serves as an effective plug-in concept encoder for standard EHR pipelines.

Anthology ID:: 2026.acl-long.753
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16544–16560
Language:
URL:: https://aclanthology.org/2026.acl-long.753/
DOI:
Bibkey:
Cite (ACL):: Mohsen Nayebi Kerdabadi, Arya Hadizadeh Moghaddam, Chen Chen, Dongjie Wang, and Zijun Yao. 2026. Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16544–16560, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation (Kerdabadi et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.753.pdf
Checklist:: 2026.acl-long.753.checklist.pdf

PDF Cite Search Checklist Fix data