Pierpaolo Basile

2025

pdf bib abs
From Detection to Explanation: Effective Learning Strategies for LLMs in Online Abusive Language Research
Chiara Di Bonaventura | Lucia Siciliani | Pierpaolo Basile | Albert Merono Penuela | Barbara McGillivray
Proceedings of the 31st International Conference on Computational Linguistics

Abusive language detection relies on understanding different levels of intensity, expressiveness and targeted groups, which requires commonsense reasoning, world knowledge and linguistic nuances that evolve over time. Here, we frame the problem as a knowledge-guided learning task, and demonstrate that LLMs’ implicit knowledge without an accurate strategy is not suitable for multi-class detection nor explanation generation. We publicly release GLlama Alarm, the knowledge-Guided version of Llama-2 instruction fine-tuned for multi-class abusive language detection and explanation generation. By being fine-tuned on structured explanations and external reliable knowledge sources, our model mitigates bias and generates explanations that are relevant to the text and coherent with human reasoning, with an average 48.76% better alignment with human judgment according to our expert survey.

pdf bib
Enhancing Linguistic Resources for Diachronic Analysis via Linked Data
Eleonora Ghizzota | Pierpaolo Basile | Claudia D’Amato | Nicola Fanizzi
Proceedings of the 13th Global Wordnet Conference

2024

Intimate Partner Violence refers to the abusive behaviours perpetrated on their own partner. Unfortunately this is a social issue that has witnessed an increase over time, particularly after Covid-19. IPV be circumscribed into two broad categories known as Intimate Partner Violence (IPV) and Cyber Intimate Partner Violence (C-IPV). Social Media and technologies can exacerbate these types of behaviors but some “digital footprints”, such as textual conversations, can be exploited by Artificial Intelligence models to detect and, in turn, prevent them. With this aim in mind, this paper describes a scenario in which the Italian Language Model family LLAmAntino can be exploited to explain the presence of toxicity elements in conversations related to teenage relationships and then educate the interlocutor to recognize these elements in the messages received.

pdf bib abs
DWUGs-IT: Extending and Standardizing Lexical Semantic Change Detection for Italian
Pierluigi Cassotti | Pierpaolo Basile | Nina Tahmasebi
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Lexical Semantic Change Detection (LSCD) is the task of determining whether a word has undergone a change in meaning over time. There has been a marked increase in interest in this task, accompanied by a corresponding growth in the scientific community involved in developing computational approaches to semantic change. In recent years, a number of resources have been made available for the evaluation of LSC models in a number of languages, including English, Swedish, German, Latin, Russian and Chinese. DIACR-ITA is the only existing resource for LSCD in Italian. However, DIACR-ITA has a different format from that used for other languages. In this paper we present DWUGs-IT, which extends the DIACR-ITA dataset with additional target words and usage-sense pair annotations and adapts it to the DURel format, including the first implementation of a LSCD graded task for Italian.

pdf bib abs
Is Explanation All You Need? An Expert Survey on LLM-generated Explanations for Abusive Language Detection
Chiara Di Bonaventura | Lucia Siciliani | Pierpaolo Basile | Albert Merono Penuela | Barbara Mcgillivray
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Explainable abusive language detection has proven to help both users and content moderators, and recent research has focused on prompting LLMs to generate explanations for why a specific text is hateful. Yet, understanding the alignment of these generated explanations with human expectations and judgements is far from being solved. In this paper, we design a before-and-after study recruiting AI experts to evaluate the usefulness and trustworthiness of LLM-generated explanations for abusive language detection tasks, investigating multiple LLMs and learning strategies. Our experiments show that expectations in terms of usefulness and trustworthiness of LLM-generated explanations are not met, as their ratings decrease by 47.78% and 64.32%, respectively, after treatment. Further, our results suggest caution in using LLMs for explanation generation of abusive language detection due to (i) their cultural bias, and (ii) difficulty in reliably evaluating them with empirical metrics. In light of our results, we provide three recommendations to use LLMs responsibly for explainable abusive language detection.

pdf bib abs
A Study on the Soundness of Closed-ended Evaluation of Large Language Models Adapted to the Italian Language
Elio Musacchio | Lucia Siciliani | Pierpaolo Basile | Edoardo Michielon | Marco Pasqualini | Asia Beatrice Uboldi | Giovanni Semeraro
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

With the rising interest in Large Language Models, deep architectures capable of solving a wide range of Natural LanguageGeneration tasks, an increasing number of open weights architectures have been developed and released online. In contrastwith older architectures, which were aimed at solving specific linguistic assignments, Large Language Models have shownoutstanding capabilities in solving several tasks at once, raising the question of whether they can truly comprehend naturallanguage. Nevertheless, evaluating this kind of capability is far from easy. One of the proposed solutions so far is usingbenchmarks that combine various types of tasks. This approach is based on the premise that achieving good performance ineach of these individual tasks can imply having developed a model capable of understanding language. However, while thisassumption is not incorrect, it is evident that it is not sufficient, and the evaluation of Large Language Models still remains anopen challenge. In this paper, we conduct a study aimed at highlighting the potential and limitations of current datasets andhow a new evaluation setting applied to language-adapted Large Language Models may provide more insight than traditionalapproaches.

The rapid development of Large Language Models (LLMs) has called for robust benchmarks to assess their abilities, track progress, and compare iterations. While existing benchmarks provide extensive evaluations across diverse tasks, they predominantly focus on English, leaving other languages underserved. For Italian, the EVALITA campaigns have provided a long-standing tradition of classification-focused shared tasks. However, their scope does not fully align with the nuanced evaluation required for modern LLMs. To address this gap, we introduce “Challenge the Abilities of LAnguage Models in ITAlian” (CALAMITA), a collaborative effort to create a dynamic and growing benchmark tailored to Italian. CALAMITA emphasizes diversity in task design to test a wide range of LLM capabilities through resources natively developed in Italian by the community. This initiative includes a shared platform, live leaderboard, and centralized evaluation framework. This paper outlines the collaborative process, initial challenges, and evaluation framework of CALAMITA.

pdf bib abs
ITA-SENSE - Evaluate LLMs’ ability for ITAlian word SENSE disambiguation: A CALAMITA Challenge
Pierpaolo Basile | Elio Musacchio | Lucia Siciliani
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

The challenge is designed to assess LLMs’ abilities in understanding lexical semantics through Word Sense Disambiguation, providing valuable insights into their performance.The idea is to cast the classical Word Sense Disambiguation task in a generative problem following two directions. Our idea is to propose two tasks: (T1) Given a target word and a sentence in which the word occurs, the LLM must generate the correct meaning definition, (T2) Given a target word and a sentence in which the word occurs, the LLM should choose from a predefined set the correct meaning definition.For T1, we compare the generated definition with respect to the correct one taken from a sense inventory, while for T2, a classical accuracy metric is used.In T1, we adopt metrics that measures the quality of the generated definition such as RougeL and the BERTscore.For CALAMITA, we test LLMs using a zero-shot setting.

pdf bib abs
Leveraging Large Language Models for Spell-Generation in Dungeons & Dragons
Elio Musacchio | Lucia Siciliani | Pierpaolo Basile | Giovanni Semeraro
Proceedings of the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024

Dungeons & Dragons (D&D) is a classic tabletop game with a 50-year history. Its intricate and customizable gameplay allows players to create endless worlds and stories. Due to the highly narrative component of this game, D&D and many other interactive games represent a challenging setting for the Natural Language Generation (NLG) capabilities of LLMs. This paper explores using LLMs to generate new spells, which are one of the most captivating aspects of D&D gameplay. Due to the scarcity of resources available for such a specific task, we build a dataset of 3,259 instances by combining official and fan-made D&D spells. We considered several LLMs in generating spells, which underwent a quantitative and qualitative evaluation. Metrics including Bleu and BertScore were computed for quantitative assessments. Subsequently, we also conducted an in-vivo evaluation with a survey involving D&D players, which could assess the quality of the generated spells as well as their adherence to the rules. Furthermore, the paper emphasizes the open-sourcing of all models, datasets, and findings, aiming to catalyze further research on this topic.

2023

pdf bib abs
XL-LEXEME: WiC Pretrained Model for Cross-Lingual LEXical sEMantic changE
Pierluigi Cassotti | Lucia Siciliani | Marco DeGemmis | Giovanni Semeraro | Pierpaolo Basile
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

The recent introduction of large-scale datasets for the WiC (Word in Context) task enables the creation of more reliable and meaningful contextualized word embeddings.However, most of the approaches to the WiC task use cross-encoders, which prevent the possibility of deriving comparable word embeddings.In this work, we introduce XL-LEXEME, a Lexical Semantic Change Detection model.XL-LEXEME extends SBERT, highlighting the target word in the sentence. We evaluate XL-LEXEME on the multilingual benchmarks for SemEval-2020 Task 1 - Lexical Semantic Change (LSC) Detection and the RuShiftEval shared task involving five languages: English, German, Swedish, Latin, and Russian.XL-LEXEME outperforms the state-of-the-art in English, German and Swedish with statistically significant differences from the baseline results and obtains state-of-the-art performance in the RuShiftEval shared task.

pdf bib
Automatic Generation of Common Procurement Vocabulary Codes
Lucia Siciliani | Emanuele Tanzi | Pierpaolo Basile | Pasquale Lops
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

pdf bib
On the Impact of Language Adaptation for Large Language Models: A Case Study for the Italian Language Using Only Open Resources
Pierpaolo Basile | Pierluigi Cassotti | Marco Polignano | Lucia Siciliani | Giovanni Semeraro
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

2021

pdf bib
Extracting Relations from Italian Wikipedia using Self-Training
Lucia Siciliani | Pierluigi Cassotti | Pierpaolo Basile | Marco de Gemmis | Pasquale Lops | Giovanni Semeraro
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

pdf bib
Emerging Trends in Gender-Specific Occupational Titles in Italian Newspapers
Pierluigi Cassotti | Andrea Iovine | Pierpaolo Basile | Marco De Gemmis | Giovanni Semeraro
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

pdf bib abs
The Corpora They Are a-Changing: a Case Study in Italian Newspapers
Pierpaolo Basile | Annalina Caputo | Tommaso Caselli | Pierluigi Cassotti | Rossella Varvara
Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021

The use of automatic methods for the study of lexical semantic change (LSC) has led to the creation of evaluation benchmarks. Benchmark datasets, however, are intimately tied to the corpus used for their creation questioning their reliability as well as the robustness of automatic methods. This contribution investigates these aspects showing the impact of unforeseen social and cultural dimensions. We also identify a set of additional issues (OCR quality, named entities) that impact the performance of the automatic methods, especially when used to discover LSC.

2020

pdf bib
A Diachronic Italian Corpus based on “L’Unità”
Pierpaolo Basile | Annalina Caputo | Tommaso Caselli | Pierluigi Cassotti | Rossella Varvara
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

pdf bib
Analysis of Lexical Semantic Changes in Corpora with the Diachronic Engine
Pierluigi Cassotti | Pierpaolo Basile | Marco de Gemmis | Giovanni Semeraro
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

pdf bib
A Deep Learning Model for the Analysis of Medical Reports in ICD-10 Clinical Coding Task
Marco Polignano | Pierpaolo Basile | Marco de Gemmis | Pasquale Lops | Giovanni Semeraro
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

pdf bib abs
GM-CTSC at SemEval-2020 Task 1: Gaussian Mixtures Cross Temporal Similarity Clustering
Pierluigi Cassotti | Annalina Caputo | Marco Polignano | Pierpaolo Basile
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the system proposed by the Random team for SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. We focus our approach on the detection problem. Given the semantics of words captured by temporal word embeddings in different time periods, we investigate the use of unsupervised methods to detect when the target word has gained or lost senses. To this end, we define a new algorithm based on Gaussian Mixture Models to cluster the target similarities computed over the two periods. We compare the proposed approach with a number of similarity-based thresholds. We found that, although the performance of the detection methods varies across the word embedding algorithms, the combination of Gaussian Mixture with Temporal Referencing resulted in our best system.

2019

pdf bib
AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets
Marco Polignano | Pierpaolo Basile | Marco de Gemmis | Giovanni Semeraro | Valerio Basile
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

pdf bib
Kronos-it: a Dataset for the Italian Semantic Change Detection Task
Pierpaolo Basile | Giovanni Semeraro | Annalina Caputo
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

pdf bib
An Italian Question Answering System for Structured Data based on Controlled Natural Languages
Lucia Siciliani | Pierpaolo Basile | Giovanni Semeraro | Matteo Mennitti
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

pdf bib abs
Diachronic Analysis of Entities by Exploiting Wikipedia Page revisions
Pierpaolo Basile | Annalina Caputo | Seamus Lawless | Giovanni Semeraro
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In the last few years, the increasing availability of large corpora spanning several time periods has opened new opportunities for the diachronic analysis of language. This type of analysis can bring to the light not only linguistic phenomena related to the shift of word meanings over time, but it can also be used to study the impact that societal and cultural trends have on this language change. This paper introduces a new resource for performing the diachronic analysis of named entities built upon Wikipedia page revisions. This resource enables the analysis over time of changes in the relations between entities (concepts), surface forms (words), and the contexts surrounding entities and surface forms, by analysing the whole history of Wikipedia internal links. We provide some useful use cases that prove the impact of this resource on diachronic studies and delineate some possible future usage.

pdf bib abs
Mining the UK Web Archive for Semantic Change Detection
Adam Tsakalidis | Marya Bazzi | Mihai Cucuringu | Pierpaolo Basile | Barbara McGillivray
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Semantic change detection (i.e., identifying words whose meaning has changed over time) started emerging as a growing area of research over the past decade, with important downstream applications in natural language processing, historical linguistics and computational social science. However, several obstacles make progress in the domain slow and difficult. These pertain primarily to the lack of well-established gold standard datasets, resources to study the problem at a fine-grained temporal resolution, and quantitative evaluation approaches. In this work, we aim to mitigate these issues by (a) releasing a new labelled dataset of more than 47K word vectors trained on the UK Web Archive over a short time-frame (2000-2013); (b) proposing a variant of Procrustes alignment to detect words that have undergone semantic shift; and (c) introducing a rank-based approach for evaluation purposes. Through extensive numerical experiments and validation, we illustrate the effectiveness of our approach against competitive baselines. Finally, we also make our resources publicly available to further enable research in the domain.

2018

pdf bib
“Buon appetito!” - Analyzing Happiness in Italian Tweets
Pierpaolo Basile | Nicole Novielli
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

2017

pdf bib abs
Centroid-based Text Summarization through Compositionality of Word Embeddings
Gaetano Rossiello | Pierpaolo Basile | Giovanni Semeraro
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

The textual similarity is a crucial aspect for many extractive text summarization methods. A bag-of-words representation does not allow to grasp the semantic relationships between concepts when comparing strongly related sentences with no words in common. To overcome this issue, in this paper we propose a centroid-based method for text summarization that exploits the compositional capabilities of word embeddings. The evaluations on multi-document and multilingual datasets prove the effectiveness of the continuous vector representation of words compared to the bag-of-words model. Despite its simplicity, our method achieves good performance even in comparison to more complex deep learning models. Our method is unsupervised and it can be adopted in other summarization tasks.

Co-authors

Venues

acl1

gwc1

ldk1

ws1