Pierpaolo Basile


2023

pdf bib
Graph Databases for Diachronic Language Data Modelling
Barbara McGillivray | Pierluigi Cassotti | Davide Di Pierro | Paola Marongiu | Anas Fahad Khan | Stefano Ferilli | Pierpaolo Basile
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
XL-LEXEME: WiC Pretrained Model for Cross-Lingual LEXical sEMantic changE
Pierluigi Cassotti | Lucia Siciliani | Marco DeGemmis | Giovanni Semeraro | Pierpaolo Basile
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

The recent introduction of large-scale datasets for the WiC (Word in Context) task enables the creation of more reliable and meaningful contextualized word embeddings.However, most of the approaches to the WiC task use cross-encoders, which prevent the possibility of deriving comparable word embeddings.In this work, we introduce XL-LEXEME, a Lexical Semantic Change Detection model.XL-LEXEME extends SBERT, highlighting the target word in the sentence. We evaluate XL-LEXEME on the multilingual benchmarks for SemEval-2020 Task 1 - Lexical Semantic Change (LSC) Detection and the RuShiftEval shared task involving five languages: English, German, Swedish, Latin, and Russian.XL-LEXEME outperforms the state-of-the-art in English, German and Swedish with statistically significant differences from the baseline results and obtains state-of-the-art performance in the RuShiftEval shared task.

2021

pdf bib
The Corpora They Are a-Changing: a Case Study in Italian Newspapers
Pierpaolo Basile | Annalina Caputo | Tommaso Caselli | Pierluigi Cassotti | Rossella Varvara
Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021

The use of automatic methods for the study of lexical semantic change (LSC) has led to the creation of evaluation benchmarks. Benchmark datasets, however, are intimately tied to the corpus used for their creation questioning their reliability as well as the robustness of automatic methods. This contribution investigates these aspects showing the impact of unforeseen social and cultural dimensions. We also identify a set of additional issues (OCR quality, named entities) that impact the performance of the automatic methods, especially when used to discover LSC.

2020

pdf bib
GM-CTSC at SemEval-2020 Task 1: Gaussian Mixtures Cross Temporal Similarity Clustering
Pierluigi Cassotti | Annalina Caputo | Marco Polignano | Pierpaolo Basile
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the system proposed by the Random team for SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. We focus our approach on the detection problem. Given the semantics of words captured by temporal word embeddings in different time periods, we investigate the use of unsupervised methods to detect when the target word has gained or lost senses. To this end, we define a new algorithm based on Gaussian Mixture Models to cluster the target similarities computed over the two periods. We compare the proposed approach with a number of similarity-based thresholds. We found that, although the performance of the detection methods varies across the word embedding algorithms, the combination of Gaussian Mixture with Temporal Referencing resulted in our best system.

2019

pdf bib
Diachronic Analysis of Entities by Exploiting Wikipedia Page revisions
Pierpaolo Basile | Annalina Caputo | Seamus Lawless | Giovanni Semeraro
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In the last few years, the increasing availability of large corpora spanning several time periods has opened new opportunities for the diachronic analysis of language. This type of analysis can bring to the light not only linguistic phenomena related to the shift of word meanings over time, but it can also be used to study the impact that societal and cultural trends have on this language change. This paper introduces a new resource for performing the diachronic analysis of named entities built upon Wikipedia page revisions. This resource enables the analysis over time of changes in the relations between entities (concepts), surface forms (words), and the contexts surrounding entities and surface forms, by analysing the whole history of Wikipedia internal links. We provide some useful use cases that prove the impact of this resource on diachronic studies and delineate some possible future usage.

pdf bib
Mining the UK Web Archive for Semantic Change Detection
Adam Tsakalidis | Marya Bazzi | Mihai Cucuringu | Pierpaolo Basile | Barbara McGillivray
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Semantic change detection (i.e., identifying words whose meaning has changed over time) started emerging as a growing area of research over the past decade, with important downstream applications in natural language processing, historical linguistics and computational social science. However, several obstacles make progress in the domain slow and difficult. These pertain primarily to the lack of well-established gold standard datasets, resources to study the problem at a fine-grained temporal resolution, and quantitative evaluation approaches. In this work, we aim to mitigate these issues by (a) releasing a new labelled dataset of more than 47K word vectors trained on the UK Web Archive over a short time-frame (2000-2013); (b) proposing a variant of Procrustes alignment to detect words that have undergone semantic shift; and (c) introducing a rank-based approach for evaluation purposes. Through extensive numerical experiments and validation, we illustrate the effectiveness of our approach against competitive baselines. Finally, we also make our resources publicly available to further enable research in the domain.

2017

pdf bib
Centroid-based Text Summarization through Compositionality of Word Embeddings
Gaetano Rossiello | Pierpaolo Basile | Giovanni Semeraro
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

The textual similarity is a crucial aspect for many extractive text summarization methods. A bag-of-words representation does not allow to grasp the semantic relationships between concepts when comparing strongly related sentences with no words in common. To overcome this issue, in this paper we propose a centroid-based method for text summarization that exploits the compositional capabilities of word embeddings. The evaluations on multi-document and multilingual datasets prove the effectiveness of the continuous vector representation of words compared to the bag-of-words model. Despite its simplicity, our method achieves good performance even in comparison to more complex deep learning models. Our method is unsupervised and it can be adopted in other summarization tasks.

2015

pdf bib
UNIBA: Combining Distributional Semantic Models and Sense Distribution for Multilingual All-Words Sense Disambiguation and Entity Linking
Pierpaolo Basile | Annalina Caputo | Giovanni Semeraro
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
UNIBA: Sentiment Analysis of English Tweets Combining Micro-blogging, Lexicon and Semantic Features
Pierpaolo Basile | Nicole Novielli
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model
Pierpaolo Basile | Annalina Caputo | Giovanni Semeraro
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
UNIBA: Combining Distributional Semantic Models and Word Sense Disambiguation for Textual Similarity
Pierpaolo Basile | Annalina Caputo | Giovanni Semeraro
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
UNIBA-CORE: Combining Strategies for Semantic Textual Similarity
Annalina Caputo | Pierpaolo Basile | Giovanni Semeraro
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

2012

pdf bib
UNIBA: Distributional Semantics for Textual Similarity
Annalina Caputo | Pierpaolo Basile | Giovanni Semeraro
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
Encoding syntactic dependencies by vector permutation
Pierpaolo Basile | Annalina Caputo | Giovanni Semeraro
Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics

2010

pdf bib
UBA: Using Automatic Translation and Wikipedia for Cross-Lingual Lexical Substitution
Pierpaolo Basile | Giovanni Semeraro
Proceedings of the 5th International Workshop on Semantic Evaluation

2008

pdf bib
Combining Knowledge-based Methods and Supervised Learning for Effective Italian Word Sense Disambiguation
Pierpaolo Basile | Marco de Gemmis | Pasquale Lops | Giovanni Semeraro
Semantics in Text Processing. STEP 2008 Conference Proceedings

2007

pdf bib
UNIBA: JIGSAW algorithm for Word Sense Disambiguation
Pierpaolo Basile | Marco de Gemmis | Anna Lisa Gentile | Pasquale Lops | Giovanni Semeraro
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)