Pia Wenzel Neves

2026

News Credibility Assessment by LLMs and Humans: Implications for Political Bias
Pia Wenzel Neves | Charlott Jakob | Vera Schmitt
The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis (WASSA 2026)

In an era of rapid misinformation spread, LLMs have emerged as tools for assessing news credibility at scale. However, the assessments are influenced by social and cultural biases. Studies investigating political bias, compare model credibility ratings with expert credibility ratings. Comparing LLMs to the perceptions of political camps extends this approach to detecting similarities in their biases.We compare LLM-generated credibility and bias ratings of news outlets with expert assessments and stratified political opinions collected through surveys. We analyse three models (Llama 3.3 70B, Mixtral 8x7B, and GPT-OSS 120B) across 47 news outlets from two countries (U.S. and Germany).We found that models demonstrated consistently high alignment with expert ratings, while showing weaker and more variable alignment with public opinions. For US-American news outlets all models showed stronger alignment with center-left perceptions, while for German news outlets the alignment is more diverse.

2025

pdf bib

Overview of the SustainEval 2025 Shared Task: Identifying the Topic and Verifiability of Sustainability Report Excerpts
Jakob Prange | Charlott Jakob | Patrick Göttfert | Raphael Huber | Pia Wenzel Neves | Annemarie Friedrich
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Workshops

pdf bib abs

PolBiX: Detecting LLMs’ Political Bias in Fact-Checking through X-phemisms
Charlott Jakob | David Harbecke | Patrick Parschan | Pia Wenzel Neves | Vera Schmitt
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Language Models are increasingly used in applications requiring objective assessment, which could be compromised by political bias. Many studies found preferences for left-leaning positions in LLMs, but downstream effects on tasks like fact-checking remain underexplored. In this study, we systematically investigate political bias through exchanging words with euphemisms or dysphemisms in German claims. We construct minimal pairs of factually equivalent claims that differ in political connotation, to assess the consistency of LLMs in classifying them as true or false. We evaluate six LLMs and find that, more than political leaning, the presence of judgmental words significantly influences truthfulness assessment. While a few models show tendencies of political bias, this is not mitigated by explicitly calling for objectivism in prompts. Warning: This paper contains content that may be offensive or upsetting.

Co-authors

Raphael Huber 1

Patrick Parschan 1

Jakob Prange 1

Venues

Fix author