SciTechBaitRO: ClickBait Detection for Romanian Science and Technology News

Raluca-Andreea Gînga; Ana Sabina Uban

doi:10.18653/v1/2024.nlp4pi-1.17

SciTechBaitRO: ClickBait Detection for Romanian Science and Technology News

Abstract

In this paper, we introduce a new annotated corpus of clickbait news in a low-resource language - Romanian, and a rarely covered domain - science and technology news: SciTechBaitRO. It is one of the first and the largest corpus (almost 11,000 examples) of annotated clickbait texts for the Romanian language and the first one to focus on the sci-tech domain, to our knowledge. We evaluate the possibility of automatically detecting clickbait through a series of data analysis and machine learning experiments with varied features and models, including a range of linguistic features, classical machine learning models, deep learning and pre-trained models. We compare the performance of models using different kinds of features, and show that the best results are given by the BERT models, with results of up to 89% F1 score. We additionally evaluate the models in a cross-domain setting for news belonging to other categories (i.e. politics, sports, entertainment) and demonstrate their capacity to generalize by detecting clickbait news outside of domain with high F1-scores.

Anthology ID:: 2024.nlp4pi-1.17
Volume:: Proceedings of the Third Workshop on NLP for Positive Impact
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Daryna Dementieva, Oana Ignat, Zhijing Jin, Rada Mihalcea, Giorgio Piatti, Joel Tetreault, Steven Wilson, Jieyu Zhao
Venues:: NLP4PI | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 188–201
Language:
URL:: https://aclanthology.org/2024.nlp4pi-1.17/
DOI:: 10.18653/v1/2024.nlp4pi-1.17
Bibkey:
Cite (ACL):: Raluca-Andreea Gînga and Ana Sabina Uban. 2024. SciTechBaitRO: ClickBait Detection for Romanian Science and Technology News. In Proceedings of the Third Workshop on NLP for Positive Impact, pages 188–201, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: SciTechBaitRO: ClickBait Detection for Romanian Science and Technology News (Gînga & Uban, NLP4PI 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.nlp4pi-1.17.pdf

PDF Cite Search Fix data