Evaluating Hierarchical Document Categorisation

Qian Sun, Aili Shen, Hiyori Yoshikawa, Chunpeng Ma, Daniel Beck, Tomoya Iwakura, Timothy Baldwin


Abstract
Hierarchical document categorisation is a special case of multi-label document categorisation, where there is a taxonomic hierarchy among the labels. While various approaches have been proposed for hierarchical document categorisation, there is no standard benchmark dataset, resulting in different methods being evaluated independently and there being no empirical consensus on what methods perform best. In this work, we examine different combinations of neural text encoders and hierarchical methods in an end-to-end framework, and evaluate over three datasets. We find that the performance of hierarchical document categorisation is determined not only by how the hierarchical information is modelled, but also the structure of the label hierarchy and class distribution.
Anthology ID:
2021.alta-1.20
Volume:
Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association
Month:
December
Year:
2021
Address:
Online
Editors:
Afshin Rahimi, William Lane, Guido Zuccon
Venue:
ALTA
SIG:
Publisher:
Australasian Language Technology Association
Note:
Pages:
179–184
Language:
URL:
https://aclanthology.org/2021.alta-1.20
DOI:
Bibkey:
Cite (ACL):
Qian Sun, Aili Shen, Hiyori Yoshikawa, Chunpeng Ma, Daniel Beck, Tomoya Iwakura, and Timothy Baldwin. 2021. Evaluating Hierarchical Document Categorisation. In Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association, pages 179–184, Online. Australasian Language Technology Association.
Cite (Informal):
Evaluating Hierarchical Document Categorisation (Sun et al., ALTA 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.alta-1.20.pdf
Data
RCV1WOS