Artemis Dampa
2025
Investigating Hierarchical Structure in Multi-Label Document Classification
Artemis Dampa
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Artemis Dampa
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Effectively organizing the vast and ever-growing body of research in scientific literature is crucial to advancing the field and supporting scholarly discovery. In this paper, we study the task of fine-grained hierarchical multi-label classification of scholarly articles, using a structured taxonomy. Specifically, we investigate whether incorporating hierarchical information in a classification method can improve performance compared to conventional flat classification approaches. To this end, we suggest and evaluate different strategies for the classification, on three different axes: selection of positive and negative samples; soft-to-hard label mapping; hierarchical post-processing policies that utilize taxonomy-related requirements to update the final labeling. Experiments demonstrate that flat baselines constitute powerful baselines, but the infusion of hierarchical knowledge leads to better recall-focused performance based on use-case requirements.