Beyond Methods and Datasets Entities: Introducing SH-NER for Hardware and Software Entity Recognition in Scientific Text

Aftab Anjum, Nimra Maqbool, Ralf Krestel


Abstract
Scientific Information Extraction (SciIE) has become essential for organizing and understanding scientific literature, powering tasks such as knowledge graph construction, method recommendation, and automated literature reviews. Although prior SciIE work commonly annotates entities such as tasks, methods, and datasets, it systematically neglects infrastructure-related entities like hardware and software specifications mentioned in publications. This gap limits key applications: knowledge graphs remain incomplete, and recommendation systems cannot effectively filter methods based on hardware compatibility. To address this gap, we introduce SH-NER, the first large-scale, manually annotated dataset focused on infrastructure-related entities in NLP research. SH-NER comprises 1,128 full-text papers from the ACL Anthology and annotates five entity types: Software, Cloud-Platform, Hardware-Device, Device-Count, and Device-Memory. Our dataset comprises over 9k sample sentences with around 6k annotated entity mentions. To assess the effectiveness of SH-NER, we conducted comprehensive experiments employing state-of-the-art supervised models alongside large language models (LLMs) as baselines. The results show that SH-NER improves scientific information extraction by better capturing infrastructure mentions. You can find the manually annotated dataset at https://github.com/coderhub84/SH-NER.
Anthology ID:
2025.ranlp-1.10
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
85–94
Language:
URL:
https://aclanthology.org/2025.ranlp-1.10/
DOI:
Bibkey:
Cite (ACL):
Aftab Anjum, Nimra Maqbool, and Ralf Krestel. 2025. Beyond Methods and Datasets Entities: Introducing SH-NER for Hardware and Software Entity Recognition in Scientific Text. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 85–94, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Beyond Methods and Datasets Entities: Introducing SH-NER for Hardware and Software Entity Recognition in Scientific Text (Anjum et al., RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.10.pdf