James Dunham


2024

pdf bib
Multi-Label Field Classification for Scientific Documents using Expert and Crowd-sourced Knowledge
Rebecca Gelles | James Dunham
Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia

Taxonomies of scientific research seek to describe complex domains of activity that are overlapping and dynamic. We address this challenge by combining knowledge curated by the Wikipedia community with the input of subject-matter experts to identify, define, and validate a system of 1,110 granular fields of study for use in multi-label classification of scientific publications. The result is capable of categorizing research across subfields of artificial intelligence, computer security, semiconductors, genetics, virology, immunology, neuroscience, biotechnology, and bioinformatics. We then develop and evaluate a solution for zero-shot classification of publications in terms of these fields.

2022

pdf bib
Multi-label Classification of Scientific Research Documents Across Domains and Languages
Autumn Toney | James Dunham
Proceedings of the Third Workshop on Scholarly Document Processing

Automatically organizing scholarly literature is a necessary and challenging task. By assigning scientific research publications key concepts, researchers, policymakers, and the general public are able to search for and discover relevant research literature. The organization of scientific research evolves with new discoveries and publications, requiring an up-to-date and scalable text classification model. Additionally, scientific research publications benefit from multi-label classification, particularly with more fine-grained sub-domains. Prior work has focused on classifying scientific publications from one research area (e.g., computer science), referencing static concept descriptions, and implementing an English-only classification model. We propose a multi-label classification model that can be implemented in non-English languages, across all of scientific literature, with updatable concept descriptions.