Crafting Tomorrow’s Headlines: Neural News Generation and Detection in English, Turkish, Hungarian, and Persian

Cem Üyük, Danica Rovó, Shaghayeghkolli Shaghayeghkolli, Rabia Varol, Georg Groh, Daryna Dementieva


Abstract
In the era dominated by information overload and its facilitation with Large Language Models (LLMs), the prevalence of misinformation poses a significant threat to public discourse and societal well-being. A critical concern at present involves the identification of machine-generated news. In this work, we take a significant step by introducing a benchmark dataset designed for neural news detection in four languages: English, Turkish, Hungarian, and Persian. The dataset incorporates outputs from multiple multilingual generators (in both, zero-shot and fine-tuned setups) such as BloomZ, LLaMa-2, Mistral, Mixtral, and GPT-4. Next, we experiment with a variety of classifiers, ranging from those based on linguistic features to advanced Transformer-based models and LLMs prompting. We present the detection results aiming to delve into the interpretablity and robustness of machine-generated texts detectors across all target languages.
Anthology ID:
2024.nlp4pi-1.25
Volume:
Proceedings of the Third Workshop on NLP for Positive Impact
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Daryna Dementieva, Oana Ignat, Zhijing Jin, Rada Mihalcea, Giorgio Piatti, Joel Tetreault, Steven Wilson, Jieyu Zhao
Venue:
NLP4PI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
271–307
Language:
URL:
https://aclanthology.org/2024.nlp4pi-1.25
DOI:
Bibkey:
Cite (ACL):
Cem Üyük, Danica Rovó, Shaghayeghkolli Shaghayeghkolli, Rabia Varol, Georg Groh, and Daryna Dementieva. 2024. Crafting Tomorrow’s Headlines: Neural News Generation and Detection in English, Turkish, Hungarian, and Persian. In Proceedings of the Third Workshop on NLP for Positive Impact, pages 271–307, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Crafting Tomorrow’s Headlines: Neural News Generation and Detection in English, Turkish, Hungarian, and Persian (Üyük et al., NLP4PI 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4pi-1.25.pdf