Haoyu Huang
CUHK, HKUST, Peking
Unverified author pages with similar names: Haoyu Huang
2025
Retrieval-Augmented Generation with Hierarchical Knowledge
Haoyu Huang | Yongfeng Huang | Yang Junjie | Zhenyu Pan | Yongqiang Chen | Kaili Ma | Hongzhi Chen | James Cheng
Findings of the Association for Computational Linguistics: EMNLP 2025
Haoyu Huang | Yongfeng Huang | Yang Junjie | Zhenyu Pan | Yongqiang Chen | Kaili Ma | Hongzhi Chen | James Cheng
Findings of the Association for Computational Linguistics: EMNLP 2025
Graph-based Retrieval-Augmented Generation (RAG) methods have significantly enhanced the performance of large language models (LLMs) in domain-specific tasks. However, existing RAG methods do not adequately utilize the naturally inherent hierarchical knowledge in human cognition, which limits the capabilities of RAG systems. In this paper, we introduce a new RAG approach, called HiRAG, which utilizes hierarchical knowledge to enhance the semantic understanding and structure capturing capabilities of RAG systems in the indexing and retrieval processes. Our extensive experiments demonstrate that HiRAG achieves significant performance improvements over the state-of-the-art baseline methods.
Can LLMs be Good Graph Judge for Knowledge Graph Construction?
Haoyu Huang | Chong Chen | Zeang Sheng | Yang Li | Wentao Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Haoyu Huang | Chong Chen | Zeang Sheng | Yang Li | Wentao Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
In real-world scenarios, most of the data obtained from the information retrieval (IR) system is unstructured. Converting natural language sentences into structured Knowledge Graphs (KGs) remains a critical challenge. We identified three limitations with respect to existing KG construction methods: (1) There could be a large amount of noise in real-world documents, which could result in extracting messy information. (2) Naive LLMs usually extract inaccurate knowledge from some domain-specific documents. (3) Hallucination phenomenon cannot be overlooked when directly using LLMs to construct KGs. In this paper, we propose GraphJudge, a KG construction framework to address the aforementioned challenges. In this framework, we designed an entity-centric strategy to eliminate the noise information in the documents. And we fine-tuned a LLM as a graph judge to finally enhance the quality of generated KGs. Experiments conducted on two general and one domain-specific text-graph pair datasets demonstrate state-of-the-art performance against various baseline methods with strong generalization abilities.