Emanuele Brugnoli


2025

pdf bib
Are Your Keywords Like My Queries? A Corpus-Wide Evaluation of Keyword Extractors with Real Searches
Martina Galletti | Giulio Prevedello | Emanuele Brugnoli | Donald Ruggiero Lo Sardo | Pietro Gravino
Proceedings of the 31st International Conference on Computational Linguistics

Keyword Extraction (KE) is essential in Natural Language Processing (NLP) for identifying key terms that represent the main themes of a text, and it is vital for applications such as information retrieval, text summarisation, and document classification. Despite the development of various KE methods — including statistical approaches and advanced deep learning models — evaluating their effectiveness remains challenging. Current evaluation metrics focus on keyword quality, balance, and overlap with annotations from authors and professional indexers, but neglect real-world information retrieval needs. This paper introduces a novel evaluation method designed to overcome this limitation by using real query data from Google Trends and can be used with both supervised and unsupervised KE approaches. We applied this method to three popular KE approaches (YAKE, RAKE and KeyBERT) and found that KeyBERT was the most effective in capturing users’ top queries, with RAKE also showing surprisingly good performance. The code is open-access and publicly available.

2024

pdf bib
Community-based Stance Detection
Emanuele Brugnoli | Donald Ruggiero Lo Sardo
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

Stance detection is a critical task in understanding the alignment or opposition of statements within social discourse. In this study, we present a novel stance detection model that labels claim-perspective pairs as either aligned or opposed. The primary innovation of our work lies in our training technique, which leverages social network data from X (formerly Twitter). Our dataset comprises tweets from opinion leaders, political entities and news outlets, along with their followers’ interactions through retweets and quotes. By reconstructing politically aligned communities based on retweet interactions, treated as endorsements, we check these communities against common knowledge representations of the political landscape.Our training dataset consists of tweet/quote pairs where the tweet comes from a political entity and the quote either originates from a follower who exclusively retweets that political entity (treated as aligned) or from a user who exclusively retweets a political entity from an opposing ideological community (treated as opposed). This curated subset is used to train an Italian language model based on the RoBERTa architecture, achieving an accuracy of approximately 85%. We then apply our model to label all tweet/quote pairs in the dataset, analyzing its out-of-sample predictions.This work not only demonstrates the efficacy of our stance detection model but also highlights the utility of social network structures in training robust NLP models. Our approach offers a scalable and accurate method for understanding political discourse and the alignment of social media statements.