FoRC4CL: A Fine-grained Field of Research Classification and Annotated Dataset of NLP Articles

Raia Abu Ahmad, Ekaterina Borisova, Georg Rehm


Abstract
The steep increase in the number of scholarly publications has given rise to various digital repositories, libraries and knowledge graphs aimed to capture, manage, and preserve scientific data. Efficiently navigating such databases requires a system able to classify scholarly documents according to the respective research (sub-)field. However, not every digital repository possesses a relevant classification schema for categorising publications. For instance, one of the largest digital archives in Computational Linguistics (CL) and Natural Language Processing (NLP), the ACL Anthology, lacks a system for classifying papers into topics and sub-topics. This paper addresses this gap by constructing a corpus of 1,500 ACL Anthology publications annotated with their main contributions using a novel hierarchical taxonomy of core CL/NLP topics and sub-topics. The corpus is used in a shared task with the goal of classifying CL/NLP papers into their respective sub-topics.
Anthology ID:
2024.lrec-main.651
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
7389–7394
Language:
URL:
https://aclanthology.org/2024.lrec-main.651
DOI:
Bibkey:
Cite (ACL):
Raia Abu Ahmad, Ekaterina Borisova, and Georg Rehm. 2024. FoRC4CL: A Fine-grained Field of Research Classification and Annotated Dataset of NLP Articles. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7389–7394, Torino, Italia. ELRA and ICCL.
Cite (Informal):
FoRC4CL: A Fine-grained Field of Research Classification and Annotated Dataset of NLP Articles (Ahmad et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.651.pdf