Multi-label Classification of Scientific Research Documents Across Domains and Languages

Autumn Toney, James Dunham


Abstract
Automatically organizing scholarly literature is a necessary and challenging task. By assigning scientific research publications key concepts, researchers, policymakers, and the general public are able to search for and discover relevant research literature. The organization of scientific research evolves with new discoveries and publications, requiring an up-to-date and scalable text classification model. Additionally, scientific research publications benefit from multi-label classification, particularly with more fine-grained sub-domains. Prior work has focused on classifying scientific publications from one research area (e.g., computer science), referencing static concept descriptions, and implementing an English-only classification model. We propose a multi-label classification model that can be implemented in non-English languages, across all of scientific literature, with updatable concept descriptions.
Anthology ID:
2022.sdp-1.12
Volume:
Proceedings of the Third Workshop on Scholarly Document Processing
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Arman Cohan, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Drahomira Herrmannova, Petr Knoth, Kyle Lo, Philipp Mayr, Michal Shmueli-Scheuer, Anita de Waard, Lucy Lu Wang
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
105–114
Language:
URL:
https://aclanthology.org/2022.sdp-1.12
DOI:
Bibkey:
Cite (ACL):
Autumn Toney and James Dunham. 2022. Multi-label Classification of Scientific Research Documents Across Domains and Languages. In Proceedings of the Third Workshop on Scholarly Document Processing, pages 105–114, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Multi-label Classification of Scientific Research Documents Across Domains and Languages (Toney & Dunham, sdp 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sdp-1.12.pdf