Arianna Redaelli
2024
Is Sentence Splitting a Solved Task? Experiments to the Intersection between NLP and Italian Linguistics
Arianna Redaelli
|
Rachele Sprugnoli
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Sentence splitting, that is the segmentation of the raw input text into sentences, is a fundamental step in text processing. Although it is considered a solved task for texts such as news articles and Wikipedia pages, the performance of systems can vary greatly depending on the text genre. This paper presents the evaluation of the performance of eight sentence splitting tools adopting different approaches (rule-based, supervised, semi-supervised, and unsupervised learning) on Italian 19th-century novels, a genre that has not received sufficient attention so far but which can be an interesting common ground between Natural Language Processing and Digital Humanities.
Annotation and Detection of Emotion Polarity in “I Promessi Sposi”: Dataset and Experiments
Rachele Sprugnoli
|
Arianna Redaelli
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Emotions play a crucial role in literature and are studied by various disciplines, e.g. literary criticism, psychology, anthropology and, more recently, also with computational methods in NLP. However, studies in the Italian context are still limited. This work therefore aims to advance the state of the art in the field of emotion analysis applied to historical texts by proposing a new dataset and describing the results of a set of emotion polarity detection experiments. The text analyzed is “I Promessi Sposi” in its final edition (published in 1840), one of the most important novels in the Italian literary and linguistic canon.
How to Annotate Emotions in Historical Italian Novels: A Case Study on I Promessi Sposi
Rachele Sprugnoli
|
Arianna Redaelli
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
This paper describes the annotation of a chapter taken from I Promessi Sposi, the most famous Italian novel of the 19th century written by Alessandro Manzoni, following 3 emotion classifications. The aim of this methodological paper is to understand: i) how the annotation procedure changes depending on the granularity of the classification, ii) how the different granularities impact the inter-annotator agreement, iii) which granularity allows good coverage of emotions, iv) if the chosen classifications are missing emotions that are important for historical literary texts. The opinion of non-experts is integrated in the present study through an online questionnaire. In addition, preliminary experiments are carried out using the new dataset as a test set to evaluate the performances of different approaches for emotion polarity detection and emotion classification respectively. Annotated data are released both as aggregated gold standard and with non-aggregated labels (that is labels before reconciliation between annotators) so to align with the perspectivist approach, that is an established practice in the Humanities and, more recently, also in NLP.