Rachele Sprugnoli

Also published as: R. Sprugnoli

2025

pdf bib

Annotating Manzoni: Challenges in the Annotation of Lemmas, POS and Features in “I Promessi Sposi”
Rachele Sprugnoli | Arianna Redaelli
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

pdf bib

Ciallabacialla! Modeling and Linking a Regional Lexical Resource to Include Sicilian in the Semantic Web
Rachele Sprugnoli | Giovanni Moretti | Domenico Giuseppe Muscianisi | Eleonora Litta
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

2024

pdf bib

Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
Rachele Sprugnoli | Marco Passarotti
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024

pdf bib abs

Overview of the EvaLatin 2024 Evaluation Campaign
Rachele Sprugnoli | Federica Iurescia | Marco Passarotti
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024

This paper describes the organization and the results of the third edition of EvaLatin, the campaign for the evaluation of Natural Language Processing tools for Latin. The two shared tasks proposed in EvaLatin 2024, i.,e., Dependency Parsing and Emotion Polarity Detection, are aimed to foster research in the field of language technologies for Classical languages. The shared datasets are described and the results obtained by the participants for each task are presented and discussed.

pdf bib abs

How to Annotate Emotions in Historical Italian Novels: A Case Study on I Promessi Sposi
Rachele Sprugnoli | Arianna Redaelli
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024

This paper describes the annotation of a chapter taken from I Promessi Sposi, the most famous Italian novel of the 19th century written by Alessandro Manzoni, following 3 emotion classifications. The aim of this methodological paper is to understand: i) how the annotation procedure changes depending on the granularity of the classification, ii) how the different granularities impact the inter-annotator agreement, iii) which granularity allows good coverage of emotions, iv) if the chosen classifications are missing emotions that are important for historical literary texts. The opinion of non-experts is integrated in the present study through an online questionnaire. In addition, preliminary experiments are carried out using the new dataset as a test set to evaluate the performances of different approaches for emotion polarity detection and emotion classification respectively. Annotated data are released both as aggregated gold standard and with non-aggregated labels (that is labels before reconciliation between annotators) so to align with the perspectivist approach, that is an established practice in the Humanities and, more recently, also in NLP.

pdf bib abs

Annotation and Detection of Emotion Polarity in “I Promessi Sposi”: Dataset and Experiments
Rachele Sprugnoli | Arianna Redaelli
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Emotions play a crucial role in literature and are studied by various disciplines, e.g. literary criticism, psychology, anthropology and, more recently, also with computational methods in NLP. However, studies in the Italian context are still limited. This work therefore aims to advance the state of the art in the field of emotion analysis applied to historical texts by proposing a new dataset and describing the results of a set of emotion polarity detection experiments. The text analyzed is “I Promessi Sposi” in its final edition (published in 1840), one of the most important novels in the Italian literary and linguistic canon.

pdf bib

pdf bib

Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
Felice Dell'Orletta | Alessandro Lenci | Simonetta Montemagni | Rachele Sprugnoli
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

pdf bib abs

Is Sentence Splitting a Solved Task? Experiments to the Intersection between NLP and Italian Linguistics
Arianna Redaelli | Rachele Sprugnoli
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Sentence splitting, that is the segmentation of the raw input text into sentences, is a fundamental step in text processing. Although it is considered a solved task for texts such as news articles and Wikipedia pages, the performance of systems can vary greatly depending on the text genre. This paper presents the evaluation of the performance of eight sentence splitting tools adopting different approaches (rule-based, supervised, semi-supervised, and unsupervised learning) on Italian 19th-century novels, a genre that has not received sufficient attention so far but which can be an interesting common ground between Natural Language Processing and Digital Humanities.

pdf bib

Preface to the CLiC-it 2024 Proceedings
Felice Dell’Orletta | Alessandro Lenci | Simonetta Montemagni | Rachele Sprugnoli
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

2023

pdf bib

“That branch of the Lake of Como...”: Developing a New Resource for the Analysis of I Promessi Sposi and its Historical Translations
Rachele Sprugnoli | Marco Sartor
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

2022

pdf bib

Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data
Ilan Kernerman | Sara Carvalho | Carlos A. Iglesias | Rachele Sprugnoli
Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data

pdf bib

Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
Rachele Sprugnoli | Marco Passarotti
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

pdf bib abs

Overview of the EvaLatin 2022 Evaluation Campaign
Rachele Sprugnoli | Marco Passarotti | Flavio Massimiliano Cecchini | Margherita Fantoli | Giovanni Moretti
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

This paper describes the organization and the results of the second edition of EvaLatin, the campaign for the evaluation of Natural Language Processing tools for Latin. The three shared tasks proposed in EvaLatin 2022, i.,e.,Lemmatization, Part-of-Speech Tagging and Features Identification, are aimed to foster research in the field of language technologies for Classical languages. The shared dataset consists of texts mainly taken from the LASLA corpus. More specifically, the training set includes only prose texts of the Classical period, whereas the test set is organized in three sub-tasks: a Classical sub-task on a prose text of an author not included in the training data, a Cross-genre sub-task on poetic and scientific texts, and a Cross-time sub-task on a text of the 15th century. The results obtained by the participants for each task and sub-task are presented and discussed.

2021

pdf bib

Sentiment Analysis of Latin Poetry: First Experiments on the Odes of Horace
Rachele Sprugnoli | Francesco Mambrini | Marco Passarotti | Giovanni Moretti
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

pdf bib

The Annotation of Liber Abbaci, a Domain-Specific Latin Resource
Francesco Grotto | Rachele Sprugnoli | Margherita Fantoli | Maria Simi | Flavio Massimiliano Cecchini | Marco Passarotti
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

2020

pdf bib abs

Overview of the EvaLatin 2020 Evaluation Campaign
Rachele Sprugnoli | Marco Passarotti | Flavio Massimiliano Cecchini | Matteo Pellegrini
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages

This paper describes the first edition of EvaLatin, a campaign totally devoted to the evaluation of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, are aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts taken from the Perseus Digital Library, processed with UDPipe models and then manually corrected by Latin experts. The training set includes only prose texts by Classical authors. The test set, alongside with prose texts by the same authors represented in the training set, also includes data relative to poetry and to the Medieval period. This also allows us to propose the Cross-genre and Cross-time subtasks for each task, in order to evaluate the portability of NLP tools for Latin across different genres and time periods. The results obtained by the participants for each task and subtask are presented and discussed.

pdf bib

UDante: First Steps Towards the Universal Dependencies Treebank of Dante’s Latin Works
Flavio Massimiliano Cecchini | Rachele Sprugnoli | Giovanni Moretti | Marco Passarotti
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

pdf bib abs

Odi et Amo. Creating, Evaluating and Extending Sentiment Lexicons for Latin.
Rachele Sprugnoli | Marco Passarotti | Daniela Corbetta | Andrea Peverelli
Proceedings of the Twelfth Language Resources and Evaluation Conference

Sentiment lexicons are essential for developing automatic sentiment analysis systems, but the resources currently available mostly cover modern languages. Lexicons for ancient languages are few and not evaluated with high-quality gold standards. However, the study of attitudes and emotions in ancient texts is a growing field of research which poses specific issues (e.g., lack of native speakers, limited amount of data, unusual textual genres for the sentiment analysis task, such as philosophical or documentary texts) and can have an impact on the work of scholars coming from several disciplines besides computational linguistics, e.g. historians and philologists. The work presented in this paper aims at providing the research community with a set of sentiment lexicons built by taking advantage of manually-curated resources belonging to the long tradition of Latin corpora and lexicons creation. Our interdisciplinary approach led us to release: i) two automatically generated sentiment lexicons; ii) a gold standard developed by two Latin language and culture experts; iii) a silver standard in which semantic and derivational relations are exploited so to extend the list of lexical items of the gold standard. In addition, the evaluation procedure is described together with a first application of the lexicons to a Latin tragedy.

pdf bib

Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages
Rachele Sprugnoli | Marco Passarotti
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages

pdf bib

MultiEmotions-It: a New Dataset for Opinion Polarity and Emotion Analysis for Italian
Rachele Sprugnoli
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

2019

pdf bib abs

Novel Event Detection and Classification for Historical Texts
Rachele Sprugnoli | Sara Tonelli
Computational Linguistics, Volume 45, Issue 2 - June 2019

Event processing is an active area of research in the Natural Language Processing community, but resources and automatic systems developed so far have mainly addressed contemporary texts. However, the recognition and elaboration of events is a crucial step when dealing with historical texts Particularly in the current era of massive digitization of historical sources: Research in this domain can lead to the development of methodologies and tools that can assist historians in enhancing their work, while having an impact also on the field of Natural Language Processing. Our work aims at shedding light on the complex concept of events when dealing with historical texts. More specifically, we introduce new annotation guidelines for event mentions and types, categorized into 22 classes. Then, we annotate a historical corpus accordingly, and compare two approaches for automatic event detection and classification following this novel scheme. We believe that this work can foster research in a field of inquiry as yet underestimated in the area of Temporal Information Processing. To this end, we release new annotation guidelines, a corpus, and new models for automatic annotation.

pdf bib

Vir is to Moderatus as Mulier is to Intemperans - Lemma Embeddings for Latin
Rachele Sprugnoli | Marco Passarotti | Giovanni Moretti
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

pdf bib

Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain
Sara Tonelli | Rachele Sprugnoli | Giovanni Moretti
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

2018

pdf bib

Analysing the Evolution of Students’ Writing Skills and the Impact of Neo-standard Italian with the help of Computational Linguistics
Rachele Sprugnoli | Sara Tonelli | Alessio Palmero Aprosio | Giovanni Moretti
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

pdf bib

Arretium or Arezzo? A Neural Approach to the Identification of Place Names in Historical Texts
Rachele Sprugnoli
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

pdf bib abs

Creating a WhatsApp Dataset to Study Pre-teen Cyberbullying
Rachele Sprugnoli | Stefano Menini | Sara Tonelli | Filippo Oncini | Enrico Piras
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)

Although WhatsApp is used by teenagers as one major channel of cyberbullying, such interactions remain invisible due to the app privacy policies that do not allow ex-post data collection. Indeed, most of the information on these phenomena rely on surveys regarding self-reported data. In order to overcome this limitation, we describe in this paper the activities that led to the creation of a WhatsApp dataset to study cyberbullying among Italian students aged 12-13. We present not only the collected chats with annotations about user role and type of offense, but also the living lab created in a collaboration between researchers and schools to monitor and analyse cyberbullying. Finally, we discuss some open issues, dealing with ethical, operational and epistemic aspects.

2017

pdf bib abs

RAMBLE ON: Tracing Movements of Popular Historical Figures
Stefano Menini | Rachele Sprugnoli | Giovanni Moretti | Enrico Bignotti | Sara Tonelli | Bruno Lepri
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present RAMBLE ON, an application integrating a pipeline for frame-based information extraction and an interface to track and display movement trajectories. The code of the extraction pipeline and a navigator are freely available; moreover we display in a demonstrator the outcome of a case study carried out on trajectories of notable persons of the XX Century.

pdf bib

A little bit of bella pianura: Detecting Code-Mixing in Historical English Travel Writing
Rachele Sprugnoli | Sara Tonelli | Giovanni Moretti | Stefano Menini
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

pdf bib abs

The Content Types Dataset: a New Resource to Explore Semantic and Functional Characteristics of Texts
Rachele Sprugnoli | Tommaso Caselli | Sara Tonelli | Giovanni Moretti
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper presents a new resource, called Content Types Dataset, to promote the analysis of texts as a composition of units with specific semantic and functional roles. By developing this dataset, we also introduce a new NLP task for the automatic classification of Content Types. The annotation scheme and the dataset are described together with two sets of classification experiments.

2016

pdf bib abs

NLP and Public Engagement: The Case of the Italian School Reform
Tommaso Caselli | Giovanni Moretti | Rachele Sprugnoli | Sara Tonelli | Damien Lanfrey | Donatella Solda Kutzmann
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present PIERINO (PIattaforma per l’Estrazione e il Recupero di INformazione Online), a system that was implemented in collaboration with the Italian Ministry of Education, University and Research to analyse the citizens’ comments given in #labuonascuola survey. The platform includes various levels of automatic analysis such as key-concept extraction and word co-occurrences. Each analysis is displayed through an intuitive view using different types of visualizations, for example radar charts and sunburst. PIERINO was effectively used to support shaping the last Italian school reform, proving the potential of NLP in the context of policy making.

pdf bib abs

“Who was Pietro Badoglio?” Towards a QA system for Italian History
Stefano Menini | Rachele Sprugnoli | Antonio Uva
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents QUANDHO (QUestion ANswering Data for italian HistOry), an Italian question answering dataset created to cover a specific domain, i.e. the history of Italy in the first half of the XX century. The dataset includes questions manually classified and annotated with Lexical Answer Types, and a set of question-answer pairs. This resource, freely available for research purposes, has been used to retrain a domain independent question answering system so to improve its performances in the domain of interest. Ongoing experiments on the development of a question classifier and an automatic tagger of Lexical Answer Types are also presented.

pdf bib abs

Temporal Information Annotation: Crowd vs. Experts
Tommaso Caselli | Rachele Sprugnoli | Oana Inel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes two sets of crowdsourcing experiments on temporal information annotation conducted on two languages, i.e., English and Italian. The first experiment, launched on the CrowdFlower platform, was aimed at classifying temporal relations given target entities. The second one, relying on the CrowdTruth metric, consisted in two subtasks: one devoted to the recognition of events and temporal expressions and one to the detection and classification of temporal relations. The outcomes of the experiments suggest a valuable use of crowdsourcing annotations also for a complex task like Temporal Processing.

2014

pdf bib abs

CROMER: a Tool for Cross-Document Event and Entity Coreference
Christian Girardi | Manuela Speranza | Rachele Sprugnoli | Sara Tonelli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present CROMER (CROss-document Main Events and entities Recognition), a novel tool to manually annotate event and entity coreference across clusters of documents. The tool has been developed so as to handle large collections of documents, perform collaborative annotation (several annotators can work on the same clusters), and enable the linking of the annotated data to external knowledge sources. Given the availability of semantic information encoded in Semantic Web resources, this tool is designed to support annotators in linking entities and events to DBPedia and Wikipedia, so as to facilitate the automatic retrieval of additional semantic information. In this way, event modelling and chaining is made easy, while guaranteeing the highest interconnection with external resources. For example, the tool can be easily linked to event models such as the Simple Event Model [Van Hage et al , 2011] and the Grounded Annotation Framework [Fokkens et al. 2013].

pdf bib abs

Crowdsourcing for the identification of event nominals: an experiment
Rachele Sprugnoli | Alessandro Lenci
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents the design and results of a crowdsourcing experiment on the recognition of Italian event nominals. The aim of the experiment was to assess the feasibility of crowdsourcing methods for a complex semantic task such as distinguishing the eventive interpretation of polysemous nominals taking into consideration various types of syntagmatic cues. Details on the theoretical background and on the experiment set up are provided together with the final results in terms of accuracy and inter-annotator agreement. These results are compared with the ones obtained by expert annotators on the same task. The low values in accuracy and Fleiss kappa of the crowdsourcing experiment demonstrate that crowdsourcing is not always optimal for complex linguistic tasks. On the other hand, the use of non-expert contributors allows to understand what are the most ambiguous patterns of polysemy and the most useful syntagmatic cues to be used to identify the eventive reading of nominals.

pdf bib

Annotating Causality in the TempEval-3 Corpus
Paramita Mirza | Rachele Sprugnoli | Sara Tonelli | Manuela Speranza
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)

2013

pdf bib

2012

pdf bib abs

CAT: the CELCT Annotation Tool
Valentina Bartalesi Lenzi | Giovanni Moretti | Rachele Sprugnoli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents CAT - CELCT Annotation Tool, a new general-purpose web-based tool for text annotation developed by CELCT (Center for the Evaluation of Language and Communication Technologies). The aim of CAT is to make text annotation an intuitive, easy and fast process. In particular, CAT was created to support human annotators in performing linguistic and semantic text annotation and was designed to improve productivity and reduce time spent on this task. Manual text annotation is, in fact, a time-consuming activity, and conflicts may arise with the strict deadlines annotation projects are frequently subject to. Thanks to its adaptability and user-friendly interface, CAT can positively contribute to improve time management in annotation project. Further, the tool has a number of features which make it an easy-to-use tool for many types of annotations. Even if the first prototype of CAT has been used to perform temporal and event annotation following the It-TimeML specifications, the tool is general enough to be used for annotating a broad range of linguistic and semantic phenomena. CAT is freely available for research purposes.

2011

pdf bib

Annotating Events, Temporal Expressions and Relations in Italian: the It-Timeml Experience for the Ita-TimeBank
Tommaso Caselli | Valentina Bartalesi Lenzi | Rachele Sprugnoli | Emanuele Pianta | Irina Prodanof
Proceedings of the 5th Linguistic Annotation Workshop

2008

pdf bib abs

EVALITA 2007, the first edition of the initiative devoted to the evaluation of Natural Language Processing tools for Italian, provided a shared framework where participants systems had the possibility to be evaluated on five different tasks, namely Part of Speech Tagging (organised by the University of Bologna), Parsing (organised by the University of Torino), Word Sense Disambiguation (organised by CNR-ILC, Pisa), Temporal Expression Recognition and Normalization (organised by CELCT, Trento), and Named Entity Recognition (organised by FBK, Trento). We believe that the diffusion of shared tasks and shared evaluation practices is a crucial step towards the development of resources and tools for Natural Language Processing. Experiences of this kind, in fact, are a valuable contribution to the validation of existing models and data, allowing for consistent comparisons among approaches and among representation schemes. The good response obtained by EVALITA, both in the number of participants and in the quality of results, showed that pursuing such goals is feasible not only for English, but also for other languages.

2006

pdf bib

pdf bib abs

In this paper we present work in progress for the creation of the Italian Content Annotation Bank (I-CAB), a corpus of Italian news annotated with semantic information at different levels. The first level is represented by temporal expressions, the second level is represented by different types of entities (i.e. person, organizations, locations and geo-political entities), and the third level is represented by relations between entities (e.g. the affiliation relation connecting a person to an organization). So far I-CAB has been manually annotated with temporal expressions, person entities and organization entities. As we intend I-CAB to become a benchmark for various automatic Information Extraction tasks, we followed a policy of reusing already available markup languages. In particular, we adopted the annotation schemes developed for the ACE Entity Detection and Time Expressions Recognition and Normalization tasks. As the ACE guidelines have originally been developed for English, part of the effort consisted in adapting them to the specific morpho-syntactic features of Italian. Finally, we have extended them to include a wider range of entities, such as conjunctions.