To Click It or Not to Click It: An Italian Dataset for Neutralising Clickbait Headlines

Daniel Russo, Oscar Araque, Marco Guerini


Abstract
Clickbait is a common technique aimed to attract reader’s attention, although it can result inaccurate and lead to misinformation. This work explores the role of current Natural Language Processing methods to reduce its negative impact. To do so, a novel Italian dataset is generated, containing manual annotations for classification, spoiling, and neutralisation of clickbait. Besides, several experimental evaluations are performed, assessing the performance of current language models. On the one hand, we evaluate the performance in the task of clickbait detection in a multilingual setting, showing that augmenting the data with English instance largely improves overall performance. On the other hand, the generation tasks of clickbait spoiling and neutralisation are explored. The latter is a novel task that is designed to increase the informativeness of a headline, thus removing the information gap. This work opens a new research avenue that has been largely uncharted in the Italian language.
Anthology ID:
2024.clicit-1.90
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
829–841
Language:
URL:
https://aclanthology.org/2024.clicit-1.90/
DOI:
Bibkey:
Cite (ACL):
Daniel Russo, Oscar Araque, and Marco Guerini. 2024. To Click It or Not to Click It: An Italian Dataset for Neutralising Clickbait Headlines. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 829–841, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
To Click It or Not to Click It: An Italian Dataset for Neutralising Clickbait Headlines (Russo et al., CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.90.pdf