Manuela Sanguinetti

2025

pdf bib abs
Introducing KIParla Forest: seeds for a UD annotation of interactional syntax
Ludovica Pannitto | Eleonora Zucchini | Silvia Ballarè | Cristina Bosco | Caterina Mauri | Manuela Sanguinetti
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)

The present project endeavors to enrich the linguistic resources available for Italian by introducing KIParla Forest, a treebank for the KIParla corpus - an existing and well-known resource for spoken Italian. This article contextualizes the project, describes the treebank creation process and design choices, and highlights future plans for next improvements.

pdf bib abs
iLostTheCode at SemEval-2025 Task 10: Bottom-up Multilevel Classification of Narrative Taxonomies
Lorenzo Vittorio Concas | Manuela Sanguinetti | Maurizio Atzori
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper describes the approach used to address the task of narrative classification, which has been proposed as a subtask of Task 10 on Multilingual Characterization and Extraction of Narratives from Online News at the SemEval 2025 campaign. The task consists precisely in assigning all relevant sub-narrative labels from a two-level taxonomy to a given news article in multiple languages (i.e., Bulgarian, English, Hindi, Portuguese and Russian). This involves performing both multi-label and multi-class classification. The model developed for this purpose uses multiple pretrained BERT-based models to create contextualized embeddings that are concatenated and then fed into a simple neural network to compute classification probabilities. Results on the official test set, evaluated using samples $F_1$, range from $0.15$ in Hindi (rank #9) to $0.41$ in Russian (rank #3). Besides an overview of the system and the results obtained in the task, the paper also includes some additional experiments carried out after the evaluation phase along with a brief discussion of the observed errors.

pdf bib abs
DEMON at SemEval-2025 Task 10: Fine-tuning LLaMA-3 for Multilingual Entity Framing
Matteo Fenu | Manuela Sanguinetti | Maurizio Atzori
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This study introduces a methodology centred on Llama 3 fine-tuning for the classification of entities mentioned within news articles, based on a predefined role taxonomy. The research is conducted as part of SemEval-2025 Task 10, which focuses on the automatic identification of narratives, their classification, and the determination of the roles of the relevant entities involved. The developed system was specifically used within Subtask 1 on Entity Framing. The approach used is based on parameter-efficient fine-tuning, in order to minimize the computational costs while maintaining reasonably good model performance across all datasets and languages involved.The model achieved promising results on both the development and test sets. Specifically, during the final evaluation phase, it attained an average accuracy of 0.84 on the main role and an average Exact Match Ratio of 0.41 in the prediction of fine-grained roles across all the five languages involved, i.e. Bulgarian, English, Hindi, Portuguese and Russian. The best performance was observed for English (3rd place out of 32 participants), on a par with Hindi and Russian. The paper provides an overview of the system adopted for the task and discusses the results obtained.

2024

pdf bib abs
Assessing Italian Large Language Models on Energy Feedback Generation: A Human Evaluation Study
Manuela Sanguinetti | Alessandro Pani | Alessandra Perniciano | Luca Zedda | Andrea Loddo | Maurizio Atzori
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

This work presents a comparison of some recently-released instruction-tuned large language models for Italian, focusing in particular on their effectiveness in a specific application scenario, i.e., that of delivering energy feedback. This work is part of a larger project aimed at developing a conversational interface for users of a renewable energy community, where clarity and accuracy of the provided feedback are important for a proper energy management. This comparison is based on the human evaluation of the output produced by such models using energy data as input. Specifically, the data pertains to information regarding the power flows within a household equipped with a photovoltaic (PV) plant and a battery storage system. The goal of the feedback is precisely that of providing the user with such information in a meaningful way based on the specific aspect they intend to monitor at a given moment (e.g., self-consumption levels, the power generated by the PV panels or imported from the main grid, or the battery state of charge). This evaluation experiment has the two-fold purpose of providing an exploratory analysis of the models’ abilities on this specific generation task solely relying on the information and instruction provided in the prompt, and as an initial investigation into their potential as reliable tools for generating user-friendly energy feedback in this intended scenario.

pdf bib abs
Educational Dialogue Systems for Visually Impaired Students: Introducing a Task-Oriented User-Agent Corpus
Elisa Di Nuovo | Manuela Sanguinetti | Pier Felice Balestrucci | Luca Anselma | Cristian Bernareggi | Alessandro Mazzei
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper describes a corpus consisting of real-world dialogues in English between users and a task-oriented conversational agent, with interactions revolving around the description of finite state automata. The creation of this corpus is part of a larger research project aimed at developing tools for an easier access to educational content, especially in STEM fields, for users with visual impairments. The development of this corpus was precisely motivated by the aim of providing a useful resource to support the design of such tools. The core feature of this corpus is that its creation involved both sighted and visually impaired participants, thus allowing for a greater diversity of perspectives and giving the opportunity to identify possible differences in the way the two groups of participants interacted with the agent. The paper introduces this corpus, giving an account of the process that led to its creation, i.e. the methodology followed to obtain the data, the annotation scheme adopted, and the analysis of the results. Finally, the paper reports the results of a classification experiment on the annotated corpus, and an additional experiment to assess the annotation capabilities of three large language models, in view of a further expansion of the corpus.

pdf bib abs
QUEEREOTYPES: A Multi-Source Italian Corpus of Stereotypes towards LGBTQIA+ Community Members
Alessandra Teresa Cignarella | Manuela Sanguinetti | Simona Frenda | Andrea Marra | Cristina Bosco | Valerio Basile
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The paper describes a dataset composed of two sub-corpora from two different sources in Italian. The QUEEREOTYPES corpus includes social media texts regarding LGBTQIA+ individuals, behaviors, ideology and events. The texts were collected from Facebook and Twitter in 2018 and were annotated for the presence of stereotypes, and orthogonal dimensions (such as hate speech, aggressiveness, offensiveness, and irony in one sub-corpus, and stance in the other). The resource was developed by Natural Language Processing researchers together with activists from an Italian LGBTQIA+ not-for-profit organization. The creation of the dataset allows the NLP community to study stereotypes against marginalized groups, individuals and, ultimately, to develop proper tools and measures to reduce the online spread of such stereotypes. A test for the robustness of the language resource has been performed by means of 5-fold cross-validation experiments. Finally, text classification experiments have been carried out with a fine-tuned version of AlBERTo (a BERT-based model pre-trained on Italian tweets) and mBERT, obtaining good results on the task of stereotype detection, suggesting that stereotypes towards different targets might share common traits.

pdf bib abs
Snarci at SemEval-2024 Task 4: Themis Model for Binary Classification of Memes
Luca Zedda | Alessandra Perniciano | Andrea Loddo | Cecilia Di Ruberto | Manuela Sanguinetti | Maurizio Atzori
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper introduces an approach developed for multimodal meme analysis, specifically targeting the identification of persuasion techniques embedded within memes. Our methodology integrates Large Language Models (LLMs) and contrastive learning image encoders to discern the presence of persuasive elements in memes across diverse platforms. By capitalizing on the contextual understanding facilitated by LLMs and the discriminative power of contrastive learning for image encoding, our framework provides a robust solution for detecting and classifying memes with persuasion techniques. The system was used in Task 4 of Semeval 2024, precisely for Substask 2b (binary classification of presence of persuasion techniques). It showed promising results overall, achieving a Macro-F1=0.7986 on the English test data (i.e., the language the system was trained on) and Macro-F1=0.66777/0.47917/0.5554, respectively, on the other three “surprise” languages proposed by the task organizers, i.e., Bulgarian, North Macedonian and Arabic. The paper provides an overview of the system, along with a discussion of the results obtained and its main limitations.

2021

pdf bib
“La ministro è incinta”: A Twitter Account of Women’s Job Titles in Italian
Alessandra Teresa Cignarella | Mirko Lai | Andrea Marra | Manuela Sanguinetti
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

2020

pdf bib abs
Multilingual Irony Detection with Dependency Syntax and Neural Models
Alessandra Teresa Cignarella | Valerio Basile | Manuela Sanguinetti | Cristina Bosco | Paolo Rosso | Farah Benamara
Proceedings of the 28th International Conference on Computational Linguistics

This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental settings are provided. In the first, a variety of syntactic dependency-based features combined with classical machine learning classifiers are explored. In the second scenario, two well-known types of word embeddings are trained on parsed data and tested against gold standard datasets. In the third setting, dependency-based syntactic features are combined into the Multilingual BERT architecture. The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.

This paper describes a novel annotation scheme specifically designed for a customer-service context where written interactions take place between a given user and the chatbot of an Italian telecommunication company. More specifically, the scheme aims to detect and highlight two aspects: the presence of errors in the conversation on both sides (i.e. customer and chatbot) and the “emotional load” of the conversation. This can be inferred from the presence of emotions of some kind (especially negative ones) in the customer messages, and from the possible empathic responses provided by the agent. The dataset annotated according to this scheme is currently used to develop the prototype of a rule-based Natural Language Generation system aimed at improving the chatbot responses and the customer experience overall.

pdf bib abs
Marking Irony Activators in a Universal Dependencies Treebank: The Case of an Italian Twitter Corpus
Alessandra Teresa Cignarella | Manuela Sanguinetti | Cristina Bosco | Paolo Rosso
Proceedings of the Twelfth Language Resources and Evaluation Conference

The recognition of irony is a challenging task in the domain of Sentiment Analysis, and the availability of annotated corpora may be crucial for its automatic processing. In this paper we describe a fine-grained annotation scheme centered on irony, in which we highlight the tokens that are responsible for its activation, (irony activators) and their morpho-syntactic features. As our case study we therefore introduce a recently released Universal Dependencies treebank for Italian which includes ironic tweets: TWITTIRÒ-UD. For the purposes of this study, we enriched the existing annotation in the treebank, with a further level that includes irony activators. A description and discussion of the annotation scheme is provided with a definition of irony activators and the guidelines for their annotation. This qualitative study on the different layers of annotation applied on the same dataset can shed some light on the process of human annotation, and irony annotation in particular, and on the usefulness of this representation for developing computational models of irony to be used for training purposes.

The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold: (1) to provide a short, though comprehensive, overview of such treebanks - based on available literature - along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD.

pdf bib abs
Content Selection for Explanation Requests in Customer-Care Domain
Luca Anselma | Mirko Di Lascio | Dario Mana | Alessandro Mazzei | Manuela Sanguinetti
2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence

This paper describes a content selection module for the generation of explanations in a dialogue system designed for customer care domain. First we describe the construction of a corpus of a dialogues containing explanation requests from customers to a virtual agent of a telco, and second we study and formalize the importance of a specific information content for the generated message. In particular, we adapt the notions of importance and relevance in the case of schematic knowledge bases.

2019

pdf bib
Is This an Effective Way to Annotate Irony Activators?
Alessandra Teresa Cignarella | Manuela Sanguinetti | Cristina Bosco | Paolo Rosso
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

pdf bib
Towards an Italian Learner Treebank in Universal Dependencies
Elisa Di Nuovo | Cristina Bosco | Alessandro Mazzei | Manuela Sanguinetti
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

pdf bib
Error Analysis in a Hate Speech Detection Task: The Case of HaSpeeDe-TW at EVALITA 2018
Chiara Francesconi | Cristina Bosco | Fabio Poletto | Manuela Sanguinetti
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

The paper describes the organization of the SemEval 2019 Task 5 about the detection of hate speech against immigrants and women in Spanish and English messages extracted from Twitter. The task is organized in two related classification subtasks: a main binary subtask for detecting the presence of hate speech, and a finer-grained one devoted to identifying further features in hateful contents such as the aggressive attitude and the target harassed, to distinguish if the incitement is against an individual rather than a group. HatEval has been one of the most popular tasks in SemEval-2019 with a total of 108 submitted runs for Subtask A and 70 runs for Subtask B, from a total of 74 different teams. Data provided for the task are described by showing how they have been collected and annotated. Moreover, the paper provides an analysis and discussion about the participant systems and the results they achieved in both subtasks.

2018

pdf bib
Long-term Social Media Data Collection at the University of Turin
Valerio Basile | Mirko Lai | Manuela Sanguinetti
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

pdf bib
PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies
Manuela Sanguinetti | Cristina Bosco | Alberto Lavelli | Alessandro Mazzei | Oronzo Antonelli | Fabio Tamburini
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
An Italian Twitter Corpus of Hate Speech against Immigrants
Manuela Sanguinetti | Fabio Poletto | Cristina Bosco | Viviana Patti | Marco Stranisci
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf bib
Annotating Italian Social Media Texts in Universal Dependencies
Manuela Sanguinetti | Cristina Bosco | Alessandro Mazzei | Alberto Lavelli | Fabio Tamburini
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2014

pdf bib abs
Exploiting catenae in a parallel treebank alignment
Manuela Sanguinetti | Cristina Bosco | Loredana Cupi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper aims to introduce the issues related to the syntactic alignment of a dependency-based multilingual parallel treebank, ParTUT. Our approach to the task starts from a lexical mapping and then attempts to expand it using dependency relations. In developing the system, however, we realized that the only dependency relations between the individual nodes were not sufficient to overcome some translation divergences, or shifts, especially in the absence of a direct lexical mapping and a different syntactic realization. For this purpose, we explored the use of a novel syntactic notion introduced in dependency theoretical framework, i.e. that of catena (Latin for “chain”), which is intended as a group of words that are continuous with respect to dominance. In relation to the task of aligning parallel dependency structures, catenae can be used to explain and identify those cases of one-to-many or many-to-many correspondences, typical of several translation shifts, that cannot be detected by means of direct word-based mappings or bare syntactic relations. The paper presented here describes the overall structure of the alignment system as it has been currently designed, how catenae are extracted from the parallel resource, and their potential relevance to the completion of tree alignment in ParTUT sentences.

2013

pdf bib
Dependency and Constituency in Translation Shift Analysis
Manuela Sanguinetti | Cristina Bosco | Leonardo Lesmo
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

2012

pdf bib abs
The Parallel-TUT: a multilingual and multiformat treebank
Cristina Bosco | Manuela Sanguinetti | Leonardo Lesmo
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper introduces an ongoing project for the development of a parallel treebank for Italian, English and French, i.e. Parallel--TUT, or simply ParTUT. For the development of this resource, both the dependency and constituency-based formats of the Italian Turin University Treebank (TUT) have been applied to a preliminary dataset, which includes the whole text of the Universal Declaration of Human Rights, and sentences from the JRC-Acquis Multilingual Parallel Corpus and the Creative Commons licence. The focus of the project is mainly on the quality of the annotation and the investigation of some issues related to the alignment of data that can be allowed by the TUT formats, also taking into account the availability of conversion tools for display data in standard ways, such as Tiger--XML and CoNLL formats. It is, in fact, our belief that increasing the portability of our treebank could give us the opportunity to access resources and tools provided by other research groups, especially at this stage of the project, where no particular tool -- compatible with the TUT format -- is available in order to tackle the alignment problems.