Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models

Chenchen Yuan; Zheyu Zhang; Gjergji Kasneci

Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models

Chenchen Yuan, Zheyu Zhang, Gjergji Kasneci

Abstract

Large language models often display heterogeneous moral preferences across settings. We study inference-time steering toward a desired ethical framework while preserving general competence. We present Convergent-Divergent Routing, which traces and edits minimal branch points inside transformer blocks where ethical-framework-related pathways first converge and then diverge. Gating non-target branches at these loci blocks the downstream propagation while leaving upstream computations intact. We find that this intervention alone increases targeted ethical-framework reasoning. To achieve fine-grained control, we adapt Common Spatial Patterns to the residual stream and extract, for each branch-point layer, a pair of directions that discriminate between utilitarian and deontological frameworks. We then introduce Dual Logit Calibration, a closed-form, minimum-ℓ₂-norm update that moves the residual within this two-dimensional subspace so the resulting directional projections align with user-specified preference weights. Experiments on real-life moral dilemmas show that our method reliably achieves preference calibration and largely preserves general capabilities, outperforming recent baselines while providing an interpretable mechanism.

Anthology ID:: 2026.acl-long.1933
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41698–41721
Language:
URL:: https://aclanthology.org/2026.acl-long.1933/
DOI:
Bibkey:
Cite (ACL):: Chenchen Yuan, Zheyu Zhang, and Gjergji Kasneci. 2026. Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 41698–41721, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models (Yuan et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1933.pdf
Checklist:: 2026.acl-long.1933.checklist.pdf

PDF Cite Search Checklist Fix data