Inductive Learning on Heterogeneous Graphs Enhanced by LLMs for Software Mention Detection

Gabriel Silva; Mário Rodriges; António Teixeira; Marlene Amorim

doi:10.18653/v1/2025.sdp-1.16

Inductive Learning on Heterogeneous Graphs Enhanced by LLMs for Software Mention Detection

Gabriel Silva, Mário Rodriges, António Teixeira, Marlene Amorim

Abstract

This paper explores the synergy between Knowledge Graphs (KGs), Graph Machine Learning (Graph ML), and Large Language Models (LLMs) for multilingual Named Entity Recognition (NER) and Relation Extraction (RE), specifically targeting software mentions within the SOMD 2025 challenge. We propose a methodology where documents are first transformed into heterogeneous KGs enriched with linguistic features (Universal Dependencies) and external knowledge (entity linking). An inductive GraphSAGE model, operating on PyTorch Geometric’s ‘HeteroData‘ structure with dynamically generated multilingual embeddings, performs node classification tasks. For NER, Graph ML identifies candidate entities and types, with an LLM (DeepSeek v3) acting as a validation layer. For RE, Graph ML predicts dependency path convergence points indicative of relations, while the LLM classifies the relation type and direction based on entity context. Our results demonstrate the potential of this hybrid approach, showing significant performance gains post-competition (NER Phase 2 Macro F1 improved to 0.4364 from 0.2953, RE Phase 1 0.3355 Macro F1), which are already described in this paper, and highlighting the benefits of integrating structured graph learning with LLM reasoning for information extraction.

Anthology ID:: 2025.sdp-1.16
Volume:: Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Tirthankar Ghosal, Philipp Mayr, Amanpreet Singh, Aakanksha Naik, Georg Rehm, Dayne Freitag, Dan Li, Sonja Schimmler, Anita De Waard
Venues:: sdp | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 164–172
Language:
URL:: https://aclanthology.org/2025.sdp-1.16/
DOI:: 10.18653/v1/2025.sdp-1.16
Bibkey:
Cite (ACL):: Gabriel Silva, Mário Rodriges, António Teixeira, and Marlene Amorim. 2025. Inductive Learning on Heterogeneous Graphs Enhanced by LLMs for Software Mention Detection. In Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025), pages 164–172, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Inductive Learning on Heterogeneous Graphs Enhanced by LLMs for Software Mention Detection (Silva et al., sdp 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.sdp-1.16.pdf

PDF Cite Search Fix data