Investigating Hierarchical Structure in Multi-Label Document Classification

Artemis Dampa


Abstract
Effectively organizing the vast and ever-growing body of research in scientific literature is crucial to advancing the field and supporting scholarly discovery. In this paper, we study the task of fine-grained hierarchical multi-label classification of scholarly articles, using a structured taxonomy. Specifically, we investigate whether incorporating hierarchical information in a classification method can improve performance compared to conventional flat classification approaches. To this end, we suggest and evaluate different strategies for the classification, on three different axes: selection of positive and negative samples; soft-to-hard label mapping; hierarchical post-processing policies that utilize taxonomy-related requirements to update the final labeling. Experiments demonstrate that flat baselines constitute powerful baselines, but the infusion of hierarchical knowledge leads to better recall-focused performance based on use-case requirements.
Anthology ID:
2025.ranlp-stud.2
Volume:
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Boris Velichkov, Ivelina Nikolova-Koleva, Milena Slavcheva
Venues:
RANLP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
10–19
Language:
URL:
https://aclanthology.org/2025.ranlp-stud.2/
DOI:
Bibkey:
Cite (ACL):
Artemis Dampa. 2025. Investigating Hierarchical Structure in Multi-Label Document Classification. In Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing, pages 10–19, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Investigating Hierarchical Structure in Multi-Label Document Classification (Dampa, RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-stud.2.pdf