VALUE ALIGNMENT TAX: Measuring Value Trade-offs in LLM Alignment

Jiajun Chen; Hua Shen

VALUE ALIGNMENT TAX: Measuring Value Trade-offs in LLM Alignment

Abstract

Existing work on value alignment typically characterizes value relations statically, ignoring how alignment interventions—such as prompting, fine-tuning, or preference optimization—reshape the broader value system. In practice, aligning a target value can implicitly shift other values, creating value trade-offs that remain largely unmeasured.We introduce the VAT, a framework that quantifies value trade-offs by measuring how alignment-induced changes propagate across interconnected values relative to achieved on-target gain. VAT captures the system-level dynamics of value expression under alignment intervention, enabling evaluation of both intended improvements and unintended side effects.Using a controlled scenario–action dataset grounded in Schwartz value theory, we collect paired pre–post normative judgments and analyze alignment effects across models, values, and interventions. Results show that alignment often produces uneven and structured co-movement among values, revealing systematic trade-offs between target and non-target values. These effects are largely invisible under conventional target-only evaluation, but become evident via VAT, highlighting process-level alignment risks and offering new insights into the dynamic nature of value alignment in LLMs.Dataset and code are open-sourced.

Anthology ID:: 2026.findings-acl.1749
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35046–35069
Language:
URL:: https://aclanthology.org/2026.findings-acl.1749/
DOI:
Bibkey:
Cite (ACL):: Jiajun Chen and Hua Shen. 2026. VALUE ALIGNMENT TAX: Measuring Value Trade-offs in LLM Alignment. In Findings of the Association for Computational Linguistics: ACL 2026, pages 35046–35069, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: VALUE ALIGNMENT TAX: Measuring Value Trade-offs in LLM Alignment (Chen & Shen, Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1749.pdf
Checklist:: 2026.findings-acl.1749.checklist.pdf

PDF Cite Search Checklist Fix data