Rebecca Gelles
2024
Multi-Label Field Classification for Scientific Documents using Expert and Crowd-sourced Knowledge
Rebecca Gelles
|
James Dunham
Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia
Taxonomies of scientific research seek to describe complex domains of activity that are overlapping and dynamic. We address this challenge by combining knowledge curated by the Wikipedia community with the input of subject-matter experts to identify, define, and validate a system of 1,110 granular fields of study for use in multi-label classification of scientific publications. The result is capable of categorizing research across subfields of artificial intelligence, computer security, semiconductors, genetics, virology, immunology, neuroscience, biotechnology, and bioinformatics. We then develop and evaluate a solution for zero-shot classification of publications in terms of these fields.
Search