Are Your Keywords Like My Queries? A Corpus-Wide Evaluation of Keyword Extractors with Real Searches

Martina Galletti; Giulio Prevedello; Emanuele Brugnoli; Donald Ruggiero Lo Sardo; Pietro Gravino

Are Your Keywords Like My Queries? A Corpus-Wide Evaluation of Keyword Extractors with Real Searches

Martina Galletti, Giulio Prevedello, Emanuele Brugnoli, Donald Ruggiero Lo Sardo, Pietro Gravino

Abstract

Keyword Extraction (KE) is essential in Natural Language Processing (NLP) for identifying key terms that represent the main themes of a text, and it is vital for applications such as information retrieval, text summarisation, and document classification. Despite the development of various KE methods — including statistical approaches and advanced deep learning models — evaluating their effectiveness remains challenging. Current evaluation metrics focus on keyword quality, balance, and overlap with annotations from authors and professional indexers, but neglect real-world information retrieval needs. This paper introduces a novel evaluation method designed to overcome this limitation by using real query data from Google Trends and can be used with both supervised and unsupervised KE approaches. We applied this method to three popular KE approaches (YAKE, RAKE and KeyBERT) and found that KeyBERT was the most effective in capturing users’ top queries, with RAKE also showing surprisingly good performance. The code is open-access and publicly available.

Anthology ID:: 2025.coling-main.133
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1943–1951
Language:
URL:: https://aclanthology.org/2025.coling-main.133/
DOI:
Bibkey:
Cite (ACL):: Martina Galletti, Giulio Prevedello, Emanuele Brugnoli, Donald Ruggiero Lo Sardo, and Pietro Gravino. 2025. Are Your Keywords Like My Queries? A Corpus-Wide Evaluation of Keyword Extractors with Real Searches. In Proceedings of the 31st International Conference on Computational Linguistics, pages 1943–1951, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Are Your Keywords Like My Queries? A Corpus-Wide Evaluation of Keyword Extractors with Real Searches (Galletti et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.133.pdf

PDF Cite Search Fix data