2024
pdf
bib
abs
FRAPPE: FRAming, Persuasion, and Propaganda Explorer
Ahmed Sajwani
|
Alaa El Setohy
|
Ali Mekky
|
Diana Turmakhan
|
Lara Hassan
|
Mohamed El Zeftawy
|
Omar El Herraoui
|
Osama Mohammed Afzal
|
Qisheng Liao
|
Tarek Mahmoud
|
Zain Muhammad Mujahid
|
Muhammad Umar Salman
|
Muhammad Arslan Manzoor
|
Massa Baali
|
Jakub Piskorski
|
Nicolas Stefanovitch
|
Giovanni Da San Martino
|
Preslav Nakov
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
The abundance of news sources and the urgent demand for reliable information have led to serious concerns about the threat of misleading information. In this paper, we present FRAPPE, a FRAming, Persuasion, and Propaganda Explorer system. FRAPPE goes beyond conventional news analysis of articles and unveils the intricate linguistic techniques used to shape readers’ opinions and emotions. Our system allows users not only to analyze individual articles for their genre, framings, and use of persuasion techniques, but also to draw comparisons between the strategies of persuasion and framing adopted by a diverse pool of news outlets and countries across multiple languages for different topics, thus providing a comprehensive understanding of how information is presented and manipulated. FRAPPE is publicly accessible at https://frappe.streamlit.app/ and a video explaining our system is available at https://www.youtube.com/watch?v=3RlTfSVnZmk
pdf
bib
abs
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection
Mervat Abassy
|
Kareem Elozeiri
|
Alexander Aziz
|
Minh Ngoc Ta
|
Raj Vardhan Tomar
|
Bimarsha Adhikari
|
Saad El Dine Ahmed
|
Yuxia Wang
|
Osama Mohammed Afzal
|
Zhuohan Xie
|
Jonibek Mansurov
|
Ekaterina Artemova
|
Vladislav Mikhailov
|
Rui Xing
|
Jiahui Geng
|
Hasan Iqbal
|
Zain Muhammad Mujahid
|
Tarek Mahmoud
|
Akim Tsvigun
|
Alham Fikri Aji
|
Artem Shelmanov
|
Nizar Habash
|
Iryna Gurevych
|
Preslav Nakov
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
The ease of access to large language models (LLMs) has enabled a widespread of machine-generated texts, and now it is often hard to tell whether a piece of text was human-written or machine-generated. This raises concerns about potential misuse, particularly within educational and academic domains. Thus, it is important to develop practical systems that can automate the process. Here, we present one such system, LLM-DetectAIve, designed for fine-grained detection. Unlike most previous work on machine-generated text detection, which focused on binary classification, LLM-DetectAIve supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished. Category (iii) aims to detect attempts to obfuscate the fact that a text was machine-generated, while category (iv) looks for cases where the LLM was used to polish a human-written text, which is typically acceptable in academic writing, but not in education. Our experiments show that LLM-DetectAIve can effectively identify the above four categories, which makes it a potentially useful tool in education, academia, and other domains.LLM-DetectAIve is publicly accessible at https://github.com/mbzuai-nlp/LLM-DetectAIve. The video describing our system is available at https://youtu.be/E8eT_bE7k8c.
pdf
bib
abs
A Survey on Predicting the Factuality and the Bias of News Media
Preslav Nakov
|
Jisun An
|
Haewoon Kwak
|
Muhammad Arslan Manzoor
|
Zain Muhammad Mujahid
|
Husrev Taha Sencar
Findings of the Association for Computational Linguistics: ACL 2024
The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim or article, either manually or automatically. An increasing number of scholars are focusing on a coarser granularity, aiming to profile entire news outlets, which allows fast identification of potential “fake news” by checking the reliability of their source. Source factuality is also an important element of systems for automatic fact-checking and “fake news” detection, as they need to assess the reliability of the evidence they retrieve online. Political bias detection, which in the Western political landscape is about predicting left-center-right bias, is an equally important topic, which has experienced a similar shift toward profiling entire news outlets. Moreover, there is a clear connection between the two, as highly biased media are less likely to be factual; yet, the two problems have been addressed separately. In this survey, we review the state of the art on media profiling for factuality and bias, arguing for the need to model them jointly. We also shed light on some of the major challenges for modeling bias and factuality jointly. We further discuss interesting recent advances in using different information sources and modalities, which go beyond the text of the articles the target news outlet has published. Finally, we discuss current challenges and outline future research directions.
pdf
bib
abs
SAFARI: Cross-lingual Bias and Factuality Detection in News Media and News Articles
Dilshod Azizov
|
Zain Muhammad Mujahid
|
Hilal AlQuabeh
|
Preslav Nakov
|
Shangsong Liang
Findings of the Association for Computational Linguistics: EMNLP 2024
In an era where information is quickly shared across many cultural and language contexts, the neutrality and integrity of news media are essential. Ensuring that media content remains unbiased and factual is crucial for maintaining public trust. With this in mind, we introduce SAFARI (CroSs-lingual BiAs and Factuality Detection in News MediA and News ARtIcles), a novel corpus of news media and articles for predicting political bias and the factuality of reporting in a multilingual and cross-lingual setup. To the best of our knowledge, this corpus is unprecedented in its collection and introduces a dataset for political bias and factuality for three tasks: (i) media-level, (ii) article-level, and (iii) joint modeling at the article-level. At the media and article levels, we evaluate the cross-lingual ability of the models; however, in joint modeling, we evaluate on English data. Our frameworks set a new benchmark in the cross-lingual evaluation of political bias and factuality. This is achieved through the use of various Multilingual Pre-trained Language Models (MPLMs) and Large Language Models (LLMs) coupled with ensemble learning methods.
pdf
bib
abs
Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers
Yuxia Wang
|
Revanth Gangi Reddy
|
Zain Muhammad Mujahid
|
Arnav Arora
|
Aleksandr Rubashevskii
|
Jiahui Geng
|
Osama Mohammed Afzal
|
Liangming Pan
|
Nadav Borenstein
|
Aditya Pillai
|
Isabelle Augenstein
|
Iryna Gurevych
|
Preslav Nakov
Findings of the Association for Computational Linguistics: EMNLP 2024
The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present Factcheck-Bench, a holistic end-to-end framework for annotating and evaluating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels for fact-checking and correcting not just the final prediction, but also the intermediate steps that a fact-checking system might need to take. Based on this framework, we construct an open-domain factuality benchmark in three-levels of granularity: claim, sentence, and document. We further propose a system, Factcheck-GPT, which follows our framework, and we show that it outperforms several popular LLM fact-checkers. We make our annotation tool, annotated data, benchmark, and code available at https://github.com/yuxiaw/Factcheck-GPT.