QA‐LIGN: Aligning LLMs through Constitutionally Decomposed QA

Jacob Dineen; Aswin Rrv; Qin Liu; Zhikun Xu; Xiao Ye; Ming Shen; Zhaonan Li; Shijie Lu; Chitta Baral; Muhao Chen; Ben Zhou

doi:10.18653/v1/2025.findings-emnlp.1123

QA‐LIGN: Aligning LLMs through Constitutionally Decomposed QA

Jacob Dineen, Aswin Rrv, Qin Liu, Zhikun Xu, Xiao Ye, Ming Shen, Zhaonan Li, Shijie Lu, Chitta Baral, Muhao Chen, Ben Zhou

Abstract

Alignment of large language models (LLMs) with principles like helpfulness, honesty, and harmlessness typically relies on scalar rewards that obscure which objectives drive the training signal. We introduce QA-LIGN, which decomposes monolithic rewards into interpretable principle-specific evaluations through structured natural language programs. Models learn through a draft, critique, and revise pipeline, where symbolic evaluation against the rubrics provides transparent feedback for both initial and revised responses during GRPO training. Applied to uncensored Llama-3.1-8B-Instruct, QA-LIGN reduces attack success rates by up to 68.7% while maintaining a 0.67% false refusal rate, achieving Pareto optimal safety-helpfulness performance and outperforming both DPO and GRPO with state-of-the-art reward models given equivalent training. These results demonstrate that making reward signals interpretable and modular improves alignment effectiveness, suggesting transparency enhances LLM safety.

Anthology ID:: 2025.findings-emnlp.1123
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20619–20642
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.1123/
DOI:: 10.18653/v1/2025.findings-emnlp.1123
Bibkey:
Cite (ACL):: Jacob Dineen, Aswin Rrv, Qin Liu, Zhikun Xu, Xiao Ye, Ming Shen, Zhaonan Li, Shijie Lu, Chitta Baral, Muhao Chen, and Ben Zhou. 2025. QA‐LIGN: Aligning LLMs through Constitutionally Decomposed QA. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 20619–20642, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: QA‐LIGN: Aligning LLMs through Constitutionally Decomposed QA (Dineen et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.1123.pdf
Checklist:: 2025.findings-emnlp.1123.checklist.pdf

PDF Cite Search Checklist Fix data