Taxonomy-Driven Knowledge Graph Construction for Domain-Specific Scientific Applications

Huitong Pan; Qi Zhang; Mustapha Adamu; Eduard Dragut; Longin Jan Latecki

doi:10.18653/v1/2025.findings-acl.223

Taxonomy-Driven Knowledge Graph Construction for Domain-Specific Scientific Applications

Huitong Pan, Qi Zhang, Mustapha Adamu, Eduard Dragut, Longin Jan Latecki

Abstract

We present a taxonomy-driven framework for constructing domain-specific knowledge graphs (KGs) that integrates structured taxonomies, Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). Although we focus on climate science to illustrate its effectiveness, our approach can potentially be adapted for other specialized domains. Existing methods often neglect curated taxonomies—hierarchies of verified entities and relationships—and LLMs frequently struggle to extract KGs in specialized domains. Our approach addresses these gaps by anchoring extraction to expert-curated taxonomies, aligning entities and relations with domain semantics, and validating LLM outputs using RAG against the domain taxonomy. Through a climate science case study using our annotated dataset of 25 publications (1,705 entity-publication links, 3,618 expert-validated relationships), we demonstrate that taxonomy-guided LLM prompting combined with RAG-based validation reduces hallucinations by 23.3% while improving F1 scores by 13.9% compared to baselines without the proposed techniques. Our contributions include: 1) a generalizable methodology for taxonomy-aligned KG construction; 2) a reproducible annotation pipeline, 3) the first benchmark dataset for climate science information retrieval; and 4) empirical insights into combining structured taxonomies with LLMs for specialized domains. The dataset, including expert annotations and taxonomy-aligned outputs, is publicly available at https://github.com/Jo-Pan/ClimateIE, and the accompanying framework can be accessed at https://github.com/Jo-Pan/TaxoDrivenKG.

Anthology ID:: 2025.findings-acl.223
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4295–4320
Language:
URL:: https://aclanthology.org/2025.findings-acl.223/
DOI:: 10.18653/v1/2025.findings-acl.223
Bibkey:
Cite (ACL):: Huitong Pan, Qi Zhang, Mustapha Adamu, Eduard Dragut, and Longin Jan Latecki. 2025. Taxonomy-Driven Knowledge Graph Construction for Domain-Specific Scientific Applications. In Findings of the Association for Computational Linguistics: ACL 2025, pages 4295–4320, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Taxonomy-Driven Knowledge Graph Construction for Domain-Specific Scientific Applications (Pan et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.223.pdf

PDF Cite Search Fix data