2024
pdf
bib
abs
Evaluating In-Context Learning for Computational Literary Studies: A Case Study Based on the Automatic Recognition of Knowledge Transfer in German Drama
Janis Pagel
|
Axel Pichler
|
Nils Reiter
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
In this paper, we evaluate two different natural language processing (NLP) approaches to solve a paradigmatic task for computational literary studies (CLS): the recognition of knowledge transfer in literary texts. We focus on the question of how adequately large language models capture the transfer of knowledge about family relations in German drama texts when this transfer is treated as a classification or textual entailment task using in-context learning (ICL). We find that a 13 billion parameter LLAMA 2 model performs best on the former, while GPT-4 performs best on the latter task. However, all models achieve relatively low scores compared to standard NLP benchmark results, struggle from inconsistencies with small changes in prompts and are often not able to make simple inferences beyond the textual surface, which is why an unreflected generic use of ICL in the CLS seems still not advisable.
pdf
bib
abs
Modeling Moravian Memoirs: Ternary Sentiment Analysis in a Low Resource Setting
Patrick Brookshire
|
Nils Reiter
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
The Moravians are a Christian group that has emerged from a 15th century movement. In this paper, we investigate how memoirs written by the devotees of this group can be analyzed with methods from computational linguistics, in particular sentiment analysis. To this end, we experiment with two different fine-tuning strategies and find that the best performance for ternary sentiment analysis (81% accuracy) is achieved by fine-tuning a German BERT model, outperforming in particular models trained on much larger German sentiment datasets. We further investigate the model(s) using SHAP scores and find that the best performing model struggles with multiple negations and mixed statements. Finally, we show two application scenarios motivated by research questions from religious studies.
2023
pdf
bib
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Stefania Degaetano-Ortlieb
|
Anna Kazantseva
|
Nils Reiter
|
Stan Szpakowicz
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
2022
pdf
bib
abs
Exploring Text Recombination for Automatic Narrative Level Detection
Nils Reiter
|
Judith Sieker
|
Svenja Guhr
|
Evelyn Gius
|
Sina Zarrieß
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Automatizing the process of understanding the global narrative structure of long texts and stories is still a major challenge for state-of-the-art natural language understanding systems, particularly because annotated data is scarce and existing annotation workflows do not scale well to the annotation of complex narrative phenomena. In this work, we focus on the identification of narrative levels in texts corresponding to stories that are embedded in stories. Lacking sufficient pre-annotated training data, we explore a solution to deal with data scarcity that is common in machine learning: the automatic augmentation of an existing small data set of annotated samples with the help of data synthesis. We present a workflow for narrative level detection, that includes the operationalization of the task, a model, and a data augmentation protocol for automatically generating narrative texts annotated with breaks between narrative levels. Our experiments suggest that narrative levels in long text constitute a challenging phenomenon for state-of-the-art NLP models, but generating training data synthetically does improve the prediction results considerably.
pdf
bib
Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Stefania Degaetano
|
Anna Kazantseva
|
Nils Reiter
|
Stan Szpakowicz
Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
2021
pdf
bib
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Stefania Degaetano-Ortlieb
|
Anna Kazantseva
|
Nils Reiter
|
Stan Szpakowicz
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
pdf
bib
abs
Detecting Scenes in Fiction: A new Segmentation Task
Albin Zehe
|
Leonard Konle
|
Lea Katharina Dümpelmann
|
Evelyn Gius
|
Andreas Hotho
|
Fotis Jannidis
|
Lucas Kaufmann
|
Markus Krug
|
Frank Puppe
|
Nils Reiter
|
Annekea Schreiber
|
Nathalie Wiedmer
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
This paper introduces the novel task of scene segmentation on narrative texts and provides an annotated corpus, a discussion of the linguistic and narrative properties of the task and baseline experiments towards automatic solutions. A scene here is a segment of the text where time and discourse time are more or less equal, the narration focuses on one action and location and character constellations stay the same. The corpus we describe consists of German-language dime novels (550k tokens) that have been annotated in parallel, achieving an inter-annotator agreement of gamma = 0.7. Baseline experiments using BERT achieve an F1 score of 24%, showing that the task is very challenging. An automatic scene segmentation paves the way towards processing longer narrative texts like tales or novels by breaking them down into smaller, coherent and meaningful parts, which is an important stepping stone towards the reconstruction of plot in Computational Literary Studies but also can serve to improve tasks like coreference resolution.
pdf
bib
abs
DramaCoref: A Hybrid Coreference Resolution System for German Theater Plays
Janis Pagel
|
Nils Reiter
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference
We present a system for resolving coreference on theater plays, DramaCoref. The system uses neural network techniques to provide a list of potential mentions. These mentions are assigned to common entities using generic and domain-specific rules. We find that DramaCoref works well on the theater plays when compared to corpora from other domains and profits from the inclusion of information specific to theater plays. On the best-performing setup, it achieves a CoNLL score of 32% when using automatically detected mentions and 55% when using gold mentions. Single rules achieve high precision scores; however, rules designed on other domains are often not applicable or yield unsatisfactory results. Error analysis shows that the mention detection is the main weakness of the system, providing directions for future improvements.
2020
pdf
bib
abs
GerDraCor-Coref: A Coreference Corpus for Dramatic Texts in German
Janis Pagel
|
Nils Reiter
Proceedings of the Twelfth Language Resources and Evaluation Conference
Dramatic texts are a highly structured literary text type. Their quantitative analysis so far has relied on analysing structural properties (e.g., in the form of networks). Resolving coreferences is crucial for an analysis of the content of the character speech, but developing automatic coreference resolution (CR) systems depends on the existence of annotated corpora. In this paper, we present an annotated corpus of German dramatic texts, a preliminary analysis of the corpus as well as some baseline experiments on automatic CR. The analysis shows that with respect to the reference structure, dramatic texts are very different from news texts, but more similar to other dialogical text types such as interviews. Baseline experiments show a performance of 28.8 CoNLL score achieved by the rule-based CR system CorZu. In the future, we plan to integrate the (partial) information given in the dramatis personae into the CR model.
pdf
bib
Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Stefania DeGaetano
|
Anna Kazantseva
|
Nils Reiter
|
Stan Szpakowicz
Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
2019
pdf
bib
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Beatrice Alex
|
Stefania Degaetano-Ortlieb
|
Anna Kazantseva
|
Nils Reiter
|
Stan Szpakowicz
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
2018
pdf
bib
QUD-Based Annotation of Discourse Structure and Information Structure: Tool and Evaluation
Kordula De Kuthy
|
Nils Reiter
|
Arndt Riester
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
bib
Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Beatrice Alex
|
Stefania Degaetano-Ortlieb
|
Anna Feldman
|
Anna Kazantseva
|
Nils Reiter
|
Stan Szpakowicz
Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
pdf
bib
abs
Towards Coreference for Literary Text: Analyzing Domain-Specific Phenomena
Ina Roesiger
|
Sarah Schulz
|
Nils Reiter
Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Coreference resolution is the task of grouping together references to the same discourse entity. Resolving coreference in literary texts could benefit a number of Digital Humanities (DH) tasks, such as analyzing the depiction of characters and/or their relations. Domain-dependent training data has shown to improve coreference resolution for many domains, e.g. the biomedical domain, as its properties differ significantly from news text or dialogue, on which automatic systems are typically trained. Literary texts could also benefit from corpora annotated with coreference. We therefore analyze the specific properties of coreference-related phenomena on a number of texts and give directions for the adaptation of annotation guidelines. As some of the adaptations have profound impact, we also present a new annotation tool for coreference, with a focus on enabling annotation of long texts with many discourse entities.
2017
pdf
bib
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Beatrice Alex
|
Stefania Degaetano-Ortlieb
|
Anna Feldman
|
Anna Kazantseva
|
Nils Reiter
|
Stan Szpakowicz
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
pdf
bib
abs
An End-to-end Environment for Research Question-Driven Entity Extraction and Network Analysis
Andre Blessing
|
Nora Echelmeyer
|
Markus John
|
Nils Reiter
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
This paper presents an approach to extract co-occurrence networks from literary texts. It is a deliberate decision not to aim for a fully automatic pipeline, as the literary research questions need to guide both the definition of the nature of the things that co-occur as well as how to decide co-occurrence. We showcase the approach on a Middle High German romance, Parzival. Manual inspection and discussion shows the huge impact various choices have.
2016
pdf
bib
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Nils Reiter
|
Beatrice Alex
|
Kalliopi A. Zervanou
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
2015
pdf
bib
Towards Annotating Narrative Segments
Nils Reiter
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)
2010
pdf
bib
Identifying Generic Noun Phrases
Nils Reiter
|
Anette Frank
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
pdf
bib
Proceedings of the ACL 2010 Student Research Workshop
Seniz Demir
|
Jan Raab
|
Nils Reiter
|
Marketa Lopatkova
|
Tomek Strzalkowski
Proceedings of the ACL 2010 Student Research Workshop
pdf
bib
abs
Using NLP Methods for the Analysis of Rituals
Nils Reiter
|
Oliver Hellwig
|
Anand Mishra
|
Anette Frank
|
Jens Burkhardt
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper gives an overview of an interdisciplinary research project that is concerned with the application of computational linguistics methods to the analysis of the structure and variance of rituals, as investigated in ritual science. We present motivation and prospects of a computational approach to ritual research, and explain the choice of specific analysis techniques. We discuss design decisions for data collection and processing and present the general NLP architecture. For the analysis of ritual descriptions, we apply the frame semantics paradigm with newly invented frames where appropriate. Using scientific ritual research literature, we experimented with several techniques of automatic extraction of domain terms for the domain of rituals. As ritual research is a highly interdisciplinary endeavour, a vocabulary common to all sub-areas of ritual research can is hard to specify and highly controversial. The domain terms extracted from ritual research literature are used as a basis for a common vocabulary and thus help the creation of ritual specific frames. We applied the tf*idf, 2 and PageRank algorithm to our ritual research literature corpus and two non-domain corpora: The British National Corpus and the British Academic Written English corpus. All corpora have been part of speech tagged and lemmatized. The domain terms have been evaluated by two ritual experts independently. Interestingly, the results of the algorithms were different for different parts of speech. This finding is in line with the fact that the inter-annotator agreement also differs between parts of speech.
2009
pdf
bib
Proceedings of the Student Research Workshop at EACL 2009
Vera Demberg
|
Yanjun Ma
|
Nils Reiter
Proceedings of the Student Research Workshop at EACL 2009
2008
pdf
bib
A Resource-Poor Approach for Linking Ontology Classes to Wikipedia Articles
Nils Reiter
|
Matthias Hartung
|
Anette Frank
Semantics in Text Processing. STEP 2008 Conference Proceedings
2007
pdf
bib
A Semantic Approach To Textual Entailment: System Evaluation and Task Analysis
Aljoscha Burchardt
|
Nils Reiter
|
Stefan Thater
|
Anette Frank
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing