Multi-Label Field Classification for Scientific Documents using Expert and Crowd-sourced Knowledge

Rebecca Gelles, James Dunham


Abstract
Taxonomies of scientific research seek to describe complex domains of activity that are overlapping and dynamic. We address this challenge by combining knowledge curated by the Wikipedia community with the input of subject-matter experts to identify, define, and validate a system of 1,110 granular fields of study for use in multi-label classification of scientific publications. The result is capable of categorizing research across subfields of artificial intelligence, computer security, semiconductors, genetics, virology, immunology, neuroscience, biotechnology, and bioinformatics. We then develop and evaluate a solution for zero-shot classification of publications in terms of these fields.
Anthology ID:
2024.wikinlp-1.7
Volume:
Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Lucie Lucie-Aimée, Angela Fan, Tajuddeen Gwadabe, Isaac Johnson, Fabio Petroni, Daniel van Strien
Venue:
WikiNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14–20
Language:
URL:
https://aclanthology.org/2024.wikinlp-1.7
DOI:
Bibkey:
Cite (ACL):
Rebecca Gelles and James Dunham. 2024. Multi-Label Field Classification for Scientific Documents using Expert and Crowd-sourced Knowledge. In Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia, pages 14–20, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Multi-Label Field Classification for Scientific Documents using Expert and Crowd-sourced Knowledge (Gelles & Dunham, WikiNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wikinlp-1.7.pdf