Giulio Prevedello
2025
Are Your Keywords Like My Queries? A Corpus-Wide Evaluation of Keyword Extractors with Real Searches
Martina Galletti
|
Giulio Prevedello
|
Emanuele Brugnoli
|
Donald Ruggiero Lo Sardo
|
Pietro Gravino
Proceedings of the 31st International Conference on Computational Linguistics
Keyword Extraction (KE) is essential in Natural Language Processing (NLP) for identifying key terms that represent the main themes of a text, and it is vital for applications such as information retrieval, text summarisation, and document classification. Despite the development of various KE methods — including statistical approaches and advanced deep learning models — evaluating their effectiveness remains challenging. Current evaluation metrics focus on keyword quality, balance, and overlap with annotations from authors and professional indexers, but neglect real-world information retrieval needs. This paper introduces a novel evaluation method designed to overcome this limitation by using real query data from Google Trends and can be used with both supervised and unsupervised KE approaches. We applied this method to three popular KE approaches (YAKE, RAKE and KeyBERT) and found that KeyBERT was the most effective in capturing users’ top queries, with RAKE also showing surprisingly good performance. The code is open-access and publicly available.
2024
Lyrics for Success: Embedding Features for Song Popularity Prediction
Giulio Prevedello
|
Ines Blin
|
Bernardo Monechi
|
Enrico Ubaldi
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)
Accurate song success prediction is vital for the music industry, guiding promotion and label decisions. Early, accurate predictions are thus crucial for informed business actions. We investigated the predictive power of lyrics embedding features, alone and in combination with other stylometric features and various Spotify metadata (audio, platform, playlists, reactions). We compiled a dataset of 12,428 Spotify tracks and targeted popularity 15 days post-release. For the embeddings, we used a Large Language Model and compared different configurations. We found that integrating embeddings with other lyrics and audio features improved early-phase predictions, underscoring the importance of a comprehensive approach to success prediction.
Search
Fix data
Co-authors
- Ines Blin 1
- Emanuele Brugnoli 1
- Martina Galletti 1
- Pietro Gravino 1
- Donald Ruggiero Lo Sardo 1
- show all...