Multi-label Classification of Scientific Research Documents Across Domains and Languages
Autumn Toney | James Dunham
Proceedings of the Third Workshop on Scholarly Document Processing
Automatically organizing scholarly literature is a necessary and challenging task. By assigning scientific research publications key concepts, researchers, policymakers, and the general public are able to search for and discover relevant research literature. The organization of scientific research evolves with new discoveries and publications, requiring an up-to-date and scalable text classification model. Additionally, scientific research publications benefit from multi-label classification, particularly with more fine-grained sub-domains. Prior work has focused on classifying scientific publications from one research area (e.g., computer science), referencing static concept descriptions, and implementing an English-only classification model. We propose a multi-label classification model that can be implemented in non-English languages, across all of scientific literature, with updatable concept descriptions.