Afrozah Nadeem


2025

pdf bib
Alignment of Large Language Models with Human Preferences and Values
Usman Naseem | Gautam Siddharth Kashyap | Kaixuan Ren | Yiran Zhang | Utsav Maskey | Juan Ren | Afrozah Nadeem
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association

Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their reliability and alignment with human expectations remain unresolved challenges. This tutorial introduces the foundations of alignment and provides participants with a conceptual and practical understanding of the field. Core principles such as values, safety, reasoning, and pluralism will be presented through intuitive explanations, worked examples, and case studies. The aim is to equip attendees with the ability to reason about alignment goals, understand how existing methods operate in practice, and critically evaluate their strengths and limitations.

pdf bib
Steering Towards Fairness: Mitigating Political Stance Bias in LLMs
Afrozah Nadeem | Mark Dras | Usman Naseem
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts

Recent advancements in large language models (LLMs) have enabled their widespread use across diverse real-world applications. However, concerns remain about their tendency to encode and reproduce ideological biases along political and economic dimensions. In this paper, we employ a framework for probing and mitigating such biases in decoder-based LLMs through analysis of internal model representations. Grounded in the Political Compass Test (PCT), this method uses contrastive pairs to extract and compare hidden layer activations from models like Mistral and DeepSeek. We introduce a comprehensive activation extraction pipeline capable of layer-wise analysis across multiple ideological axes, revealing meaningful disparities linked to political framing. Our results show that decoder LLMs systematically encode representational bias across layers, which can be leveraged for effective steering vector-based mitigation. This work provides new insights into how political bias is encoded in LLMs and offers a principled approach to debiasing beyond surface-level output interventions.