Juncheng Zeng
2020
Document Classification for COVID-19 Literature
Bernal Jiménez Gutiérrez
|
Juncheng Zeng
|
Dongdong Zhang
|
Ping Zhang
|
Yu Su
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020
The global pandemic has made it more important than ever to quickly and accurately retrieve relevant scientific literature for effective consumption by researchers in a wide range of fields. We provide an analysis of several multi-label document classification models on the LitCovid dataset. We find that pre-trained language models outperform other models in both low and high data regimes, achieving a maximum F1 score of around 86%. We note that even the highest performing models still struggle with label correlation, distraction from introductory text and CORD-19 generalization. Both data and code are available on GitHub.