Pier Felice Balestrucci

2025

Irony is a subjective and pragmatically complex phenomenon, often conveyed through rhetorical figures and interpreted differently across individuals. In this study, we adopt a perspectivist approach, accounting for the socio-demographic background of annotators, to investigate whether specific rhetorical strategies promote a shared perception of irony within demographic groups, and whether Large Language Models (LLMs) reflect specific perspectives. Focusing on the Italian subset of the perspectivist MultiPICo dataset, we manually annotate rhetorical figures in ironic replies using a linguistically grounded taxonomy. The annotation is carried out by expert annotators balanced by generation and gender, enabling us to analyze inter-group agreement and polarization. Our results show that some rhetorical figures lead to higher levels of agreement, suggesting that certain rhetorical strategies are more effective in promoting a shared perception of irony. We fine-tune multilingual LLMs for rhetorical figure classification, and evaluate whether their outputs align with different demographic perspectives. Results reveal that models show varying degrees of alignment with specific groups, reflecting potential perspectivist behavior in model predictions. These findings highlight the role of rhetorical figures in structuring irony perception and underscore the importance of socio-demographics in both annotation and model evaluation.

pdf bib abs

Can Large Language Models Personalize Dialogues to Generational Styles?
Pier Felice Balestrucci | Ondrej Dusek | Luca Anselma | Alessandro Mazzei
Findings of the Association for Computational Linguistics: EMNLP 2025

We investigate how large language models (LLMs) can produce personalized dialogue responses, specifically focusing on whether they reflect linguistic styles pertaining to different generations: Baby Boomers, Generation X, Generation Y, and Generation Z. We create P-MultiWoZ, a personalized, generation-specific version of MultiWOZ 2.2, by prompting LLMs, and validate its alignment with the original dataset through automatic and human evaluations. To validate the appropriateness of generational linguistic traits, we introduce GeMoSC, a corpus of generation-annotated movie dialogues. Linguistic analysis and perplexity test suggest that P-MultiWoZ reflects patterns consistent with GeMoSC. Finally, a human evaluation reveals that annotators were able to mostly correctly identify the generation behind P-MultiWoZ dialogues, based only on a single query-reply pair.

pdf bib

When Figures Speak with Irony: Investigating the Role of Rhetorical Figures in Irony Generation with LLMs
Pier Felice Balestrucci | Michael Oliverio | Soda Marem Lo | Luca Anselma | Valerio Basile | Alessandro Mazzei | Viviana Patti
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

pdf bib abs

WebNLG-IT: Construction of an aligned RDF-Italian corpus through Machine Translation techniques
Michael Oliverio | Pier Felice Balestrucci | Alessandro Mazzei | Valerio Basile
Findings of the Association for Computational Linguistics: ACL 2025

The main goal of this work is the creation of the Italian version of the WebNLG corpus through the application of Neural Machine Translation (NMT) and post-editing with hand-written rules. To achieve this goal, in a first step, several existing NMT models were analysed and compared in order to identify the system with the highest performance on the original corpus. In a second step, after using the best NMT system, we semi-automatically designed and applied a number of rules to refine and improve the quality of the produced resource, creating a new corpus named WebNLG-IT. We used this resource for fine-tuning several LLMs for RDF-to-text tasks. In this way, comparing the performance of LLM-based generators on both Italian and English, we have (1) evaluated the quality of WebNLG-IT with respect to the original English version, (2) released the first fine-tuned LLM-based system for generating Italian from semantic web triples and (3) introduced an Italian version of a modular generation pipeline for RDF-to-text.

pdf bib

A Modular LLM-based Dialog System for Accessible Exploration of Finite State Automata
Stefano Vittorio Porta | Pier Felice Balestrucci | Michael Oliverio | Luca Anselma | Alessandro Mazzei
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

2024

pdf bib abs

DipInfo-UniTo at the GEM’24 Data-to-Text Task: Augmenting LLMs with the Split-Generate-Aggregate Pipeline
Michael Oliverio | Pier Felice Balestrucci | Alessandro Mazzei | Valerio Basile
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges

This paper describes the DipInfo-UniTo system participating to the GEM shared task 2024. We participate only to the Data-to-Text (D2T) task. The DipInfo-UniTo system is based on Mistral (Jiang et al., 2023), a recent Large Language Model (LLM). Most LLMs are capable of generating high-quality text for D2T tasks but, crucially, they often fall short in terms of adequacy, and sometimes exhibit “hallucinations”. To mitigate this issue, we have implemented a generation pipeline that combines LLMs with techniques from the traditional Natural Language Generation (NLG) pipeline. In particular, we have a three step process SGA, consisting in (1) Splitting the original set of triples, (2) Generating verbalizations from the resulting split data units, (3) Aggregating the verbalizations produced in the previous step.

pdf bib abs

Educational Dialogue Systems for Visually Impaired Students: Introducing a Task-Oriented User-Agent Corpus
Elisa Di Nuovo | Manuela Sanguinetti | Pier Felice Balestrucci | Luca Anselma | Cristian Bernareggi | Alessandro Mazzei
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper describes a corpus consisting of real-world dialogues in English between users and a task-oriented conversational agent, with interactions revolving around the description of finite state automata. The creation of this corpus is part of a larger research project aimed at developing tools for an easier access to educational content, especially in STEM fields, for users with visual impairments. The development of this corpus was precisely motivated by the aim of providing a useful resource to support the design of such tools. The core feature of this corpus is that its creation involved both sighted and visually impaired participants, thus allowing for a greater diversity of perspectives and giving the opportunity to identify possible differences in the way the two groups of participants interacted with the agent. The paper introduces this corpus, giving an account of the process that led to its creation, i.e. the methodology followed to obtain the data, the annotation scheme adopted, and the analysis of the results. Finally, the paper reports the results of a classification experiment on the annotated corpus, and an additional experiment to assess the annotation capabilities of three large language models, in view of a further expansion of the corpus.

pdf bib abs

I’m sure you’re a real scholar yourself: Exploring Ironic Content Generation by Large Language Models
Pier Felice Balestrucci | Silvia Casola | Soda Marem Lo | Valerio Basile | Alessandro Mazzei
Findings of the Association for Computational Linguistics: EMNLP 2024

Generating ironic content is challenging: it requires a nuanced understanding of context and implicit references and balancing seriousness and playfulness. Moreover, irony is highly subjective and can depend on various factors, such as social, cultural, or generational aspects. This paper explores whether Large Language Models (LLMs) can learn to generate ironic responses to social media posts. To do so, we fine-tune two models to generate ironic and non-ironic content and deeply analyze their outputs’ linguistic characteristics, their connection to the original post, and their similarity to the human-written replies. We also conduct a large-scale human evaluation of the outputs. Additionally, we investigate whether LLMs can learn a form of irony tied to a generational perspective, with mixed results.