A Multi-level Annotated Corpus of Scientific Papers for Scientific Document Summarization and Cross-document Relation Discovery

Ahmed AbuRa’ed, Horacio Saggion, Luis Chiruzzo


Abstract
Related work sections or literature reviews are an essential part of every scientific article being crucial for paper reviewing and assessment. The automatic generation of related work sections can be considered an instance of the multi-document summarization problem. In order to allow the study of this specific problem, we have developed a manually annotated, machine readable data-set of related work sections, cited papers (e.g. references) and sentences, together with an additional layer of papers citing the references. We additionally present experiments on the identification of cited sentences, using as input citation contexts. The corpus alongside the gold standard are made available for use by the scientific community.
Anthology ID:
2020.lrec-1.824
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6672–6679
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.824
DOI:
Bibkey:
Cite (ACL):
Ahmed AbuRa’ed, Horacio Saggion, and Luis Chiruzzo. 2020. A Multi-level Annotated Corpus of Scientific Papers for Scientific Document Summarization and Cross-document Relation Discovery. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6672–6679, Marseille, France. European Language Resources Association.
Cite (Informal):
A Multi-level Annotated Corpus of Scientific Papers for Scientific Document Summarization and Cross-document Relation Discovery (AbuRa’ed et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.824.pdf