SLäNDa version 2.0: Improved and Extended Annotation of Narrative and Dialogue in Swedish Literature

Sara Stymne, Carin Östman


Abstract
In this paper, we describe version 2.0 of the SLäNDa corpus. SLäNDa, the Swedish Literary corpus of Narrative and Dialogue, now contains excerpts from 19 novels, written between 1809–1940. The main focus of the SLäNDa corpus is to distinguish between direct speech and the main narrative. In order to isolate the narrative, we also annotate everything else which does not belong to the narrative, such as thoughts, quotations, and letters. SLäNDa version 2.0 has a slightly updated annotation scheme from version 1.0. In addition, we added new texts from eleven authors and performed quality control on the previous version. We are specifically interested in different ways of marking speech segments, such as quotation marks, dashes, or no marking at all. To allow a detailed evaluation of this aspect, we added dedicated test sets to SLäNDa for these different types of speech marking. In a pilot experiment, we explore the impact of typographic speech marking by using these test sets, as well as artificially stripping the training data of speech markers.
Anthology ID:
2022.lrec-1.570
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5324–5333
Language:
URL:
https://aclanthology.org/2022.lrec-1.570
DOI:
Bibkey:
Cite (ACL):
Sara Stymne and Carin Östman. 2022. SLäNDa version 2.0: Improved and Extended Annotation of Narrative and Dialogue in Swedish Literature. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5324–5333, Marseille, France. European Language Resources Association.
Cite (Informal):
SLäNDa version 2.0: Improved and Extended Annotation of Narrative and Dialogue in Swedish Literature (Stymne & Östman, LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.570.pdf