ACT2: A multi-disciplinary semi-structured dataset for importance and purpose classification of citations

Suchetha Nambanoor Kunnath; Valentin Stauber; Ronin Wu; David Pride; Viktor Botev; Petr Knoth

ACT2: A multi-disciplinary semi-structured dataset for importance and purpose classification of citations

Suchetha Nambanoor Kunnath, Valentin Stauber, Ronin Wu, David Pride, Viktor Botev, Petr Knoth

Abstract

Classifying citations according to their purpose and importance is a challenging task that has gained considerable interest in recent years. This interest has been primarily driven by the need to create more transparent, efficient, merit-based reward systems in academia; a system that goes beyond simple bibliometric measures and considers the semantics of citations. Such systems that quantify and classify the influence of citations can act as edges that link knowledge nodes to a graph and enable efficient knowledge discovery. While a number of researchers have experimented with a variety of models, these experiments are typically limited to single-domain applications and the resulting models are hardly comparable. Recently, two Citation Context Classification (3C) shared tasks (at WOSP2020 and SDP2021) created the first benchmark enabling direct comparison of citation classification approaches, revealing the crucial impact of supplementary data on the performance of models. Reflecting from the findings of these shared tasks, we are releasing a new multi-disciplinary dataset, ACT2, an extended SDP 3C shared task dataset. This modified corpus has annotations for both citation function and importance classes newly enriched with supplementary contextual and non-contextual feature sets the selection of which follows from the lists of features used by the more successful teams in these shared tasks. Additionally, we include contextual features for cited papers (e.g. Abstract of the cited paper), which most existing datasets lack, but which have a lot of potential to improve results. We describe the methodology used for feature extraction and the challenges involved in the process. The feature enriched ACT2 dataset is available at https://github.com/oacore/ACT2.

Anthology ID:: 2022.lrec-1.363
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 3398–3406
Language:
URL:: https://aclanthology.org/2022.lrec-1.363/
DOI:
Bibkey:
Cite (ACL):: Suchetha Nambanoor Kunnath, Valentin Stauber, Ronin Wu, David Pride, Viktor Botev, and Petr Knoth. 2022. ACT2: A multi-disciplinary semi-structured dataset for importance and purpose classification of citations. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3398–3406, Marseille, France. European Language Resources Association.
Cite (Informal):: ACT2: A multi-disciplinary semi-structured dataset for importance and purpose classification of citations (Nambanoor Kunnath et al., LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.363.pdf

PDF Cite Search Fix data