Evaluating the Prompt Steerability of Large Language Models

Erik Miehling; Michael Desmond; Karthikeyan Natesan Ramamurthy; Elizabeth M. Daly; Kush R. Varshney; Eitan Farchi; Pierre Dognin; Jesus Rios; Djallel Bouneffouf; Miao Liu; Prasanna Sattigeri

doi:10.18653/v1/2025.naacl-long.400

Evaluating the Prompt Steerability of Large Language Models

Erik Miehling, Michael Desmond, Karthikeyan Natesan Ramamurthy, Elizabeth M. Daly, Kush R. Varshney, Eitan Farchi, Pierre Dognin, Jesus Rios, Djallel Bouneffouf, Miao Liu, Prasanna Sattigeri

Abstract

Building pluralistic AI requires designing models that are able to be shaped to represent a wide range of value systems and cultures. Achieving this requires first being able to evaluate the degree to which a given model is capable of reflecting various personas. To this end, we propose a benchmark for evaluating the steerability of model personas as a function of prompting. Our design is based on a formal definition of prompt steerability, which analyzes the degree to which a model’s joint behavioral distribution can be shifted from its baseline. By defining steerability indices and inspecting how these indices change as a function of steering effort, we can estimate the steerability of a model across various persona dimensions and directions. Our benchmark reveals that the steerability of many current models is limited — due to both a skew in their baseline behavior and an asymmetry in their steerability across many persona dimensions. We release an implementation of our benchmark at https://github.com/IBM/prompt-steering.

Anthology ID:: 2025.naacl-long.400
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7874–7900
Language:
URL:: https://aclanthology.org/2025.naacl-long.400/
DOI:: 10.18653/v1/2025.naacl-long.400
Bibkey:
Cite (ACL):: Erik Miehling, Michael Desmond, Karthikeyan Natesan Ramamurthy, Elizabeth M. Daly, Kush R. Varshney, Eitan Farchi, Pierre Dognin, Jesus Rios, Djallel Bouneffouf, Miao Liu, and Prasanna Sattigeri. 2025. Evaluating the Prompt Steerability of Large Language Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7874–7900, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Evaluating the Prompt Steerability of Large Language Models (Miehling et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-long.400.pdf

PDF Cite Search Fix data