Uncovering Ideological Bias in RAG with Lexical Multidimensional Analysis: A Case Study on COVID-19

Elmira Salari; Maria Claudia Nunes Delfino; Hazem Amamou; José Victor de Souza; Shruti Kshirsagar; Alan Davoust; Anderson Avila

Uncovering Ideological Bias in RAG with Lexical Multidimensional Analysis: A Case Study on COVID-19

Elmira Salari, Maria Claudia Nunes Delfino, Hazem Amamou, José Victor de Souza, Shruti Kshirsagar, Alan Davoust, Anderson Avila

Abstract

This paper studies the impact of retrieved ideologically framed texts on the outputs of large language models (LLMs). While interest in understanding ideological framing in LLMs has recently increased, little attention has been given to this issue in the context of Retrieval-Augmented Generation (RAG). To fill this gap, we design an external knowledge source based on ideologically framed texts about COVID-19 treatments. Our corpus is based on 1,117 academic articles representing discourses about controversial and endorsed treatments for the disease. We propose a corpus linguistics framework, based on Lexical Multidimensional Analysis (LMDA), to identify discourse dimensions within the corpus. LLMs are tasked to answer questions derived from three identified discourse dimensions, and two types of contextual prompts are adopted: the first comprises the user question and ideologically framed texts; and the second contains the question, ideologically framed texts, and LMDA descriptions. Alignment between reference ideologically framed texts and LLMs’ responses is assessed using cosine similarity for lexical and semantic representations. Results demonstrate that retrieved ideologically framed texts influence LLM responses toward the discourse framing represented in the external knowledge, with enhanced prompts further amplifying this effect. Our findings highlight the importance of identifying ideological framings within the RAG framework in order to mitigate not just unintended ideological bias, but also the risks of intentional discourse steering of such models.

Anthology ID:: 2026.starsem-conference.7
Volume:: Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Saif M. Mohammad, Nedjma Ousidhoum
Venues:: *SEM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 111–124
Language:
URL:: https://aclanthology.org/2026.starsem-conference.7/
DOI:
Bibkey:
Cite (ACL):: Elmira Salari, Maria Claudia Nunes Delfino, Hazem Amamou, José Victor de Souza, Shruti Kshirsagar, Alan Davoust, and Anderson Avila. 2026. Uncovering Ideological Bias in RAG with Lexical Multidimensional Analysis: A Case Study on COVID-19. In Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026), pages 111–124, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Uncovering Ideological Bias in RAG with Lexical Multidimensional Analysis: A Case Study on COVID-19 (Salari et al., *SEM 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.starsem-conference.7.pdf

PDF Cite Search Fix data