CORWA: A Citation-Oriented Related Work Annotation Dataset

Xiangci Li; Biswadip Mandal; Jessica Ouyang

doi:10.18653/v1/2022.naacl-main.397

CORWA: A Citation-Oriented Related Work Annotation Dataset

Xiangci Li, Biswadip Mandal, Jessica Ouyang

Abstract

Academic research is an exploratory activity to discover new solutions to problems. By this nature, academic research works perform literature reviews to distinguish their novelties from prior work. In natural language processing, this literature review is usually conducted under the “Related Work” section. The task of related work generation aims to automatically generate the related work section given the rest of the research paper and a list of papers to cite. Prior work on this task has focused on the sentence as the basic unit of generation, neglecting the fact that related work sections consist of variable length text fragments derived from different information sources. As a first step toward a linguistically-motivated related work generation framework, we present a Citation Oriented Related Work Annotation (CORWA) dataset that labels different types of citation text fragments from different information sources. We train a strong baseline model that automatically tags the CORWA labels on massive unlabeled related work section texts. We further suggest a novel framework for human-in-the-loop, iterative, abstractive related work generation.

Anthology ID:: 2022.naacl-main.397
Volume:: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5426–5440
Language:
URL:: https://aclanthology.org/2022.naacl-main.397/
DOI:: 10.18653/v1/2022.naacl-main.397
Bibkey:
Cite (ACL):: Xiangci Li, Biswadip Mandal, and Jessica Ouyang. 2022. CORWA: A Citation-Oriented Related Work Annotation Dataset. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5426–5440, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: CORWA: A Citation-Oriented Related Work Annotation Dataset (Li et al., NAACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.naacl-main.397.pdf
Software:: 2022.naacl-main.397.software.zip
Video:: https://aclanthology.org/2022.naacl-main.397.mp4

PDF Cite Search Software Video Fix data