RiQuA: A Corpus of Rich Quotation Annotation for English Literary Text

Sean Papay, Sebastian Padó


Abstract
We introduce RiQuA (RIch QUotation Annotations), a corpus that provides quotations, including their interpersonal structure (speakers and addressees) for English literary text. The corpus comprises 11 works of 19th-century literature that were manually doubly annotated for direct and indirect quotations. For each quotation, its span, speaker, addressee, and cue are identified (if present). This provides a rich view of dialogue structures not available from other available corpora. We detail the process of creating this dataset, discuss the annotation guidelines, and analyze the resulting corpus in terms of inter-annotator agreement and its properties. RiQuA, along with its annotations guidelines and associated scripts, are publicly available for use, modification, and experimentation.
Anthology ID:
2020.lrec-1.104
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
835–841
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.104
DOI:
Bibkey:
Cite (ACL):
Sean Papay and Sebastian Padó. 2020. RiQuA: A Corpus of Rich Quotation Annotation for English Literary Text. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 835–841, Marseille, France. European Language Resources Association.
Cite (Informal):
RiQuA: A Corpus of Rich Quotation Annotation for English Literary Text (Papay & Padó, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.104.pdf