Gradient-Guided Multi-Judge Prompt Optimization

ChenZhuo Zhao; Xinda Wang; Pu Zhao; Yue Huang; Junting Lu; Ziqian Liu; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang

Gradient-Guided Multi-Judge Prompt Optimization

ChenZhuo Zhao, Xinda Wang, Pu Zhao, Yue Huang, Junting Lu, Ziqian Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

Abstract

Automatic prompt optimization is a practical alternative to fine-tuning for adapting large language models (LLMs), yet existing approaches often trade off signal quality against computational cost. Methods that rely on generative feedback can be informative but expensive to scale, while sampling-based optimization typically requires many evaluations and exhibits high variance. Even loss-driven prompt optimization remains limited by costly segment attribution that scales with prompt length and by overfitting to a single evaluator, which weakens transfer across model families and domains. We propose Gradient-guided Multi-judge Prompt Optimization (GMPO), a scalable framework that improves both efficiency and robustness. GMPO uses a first-order gradient approximation to score segment importance in a continuous masking direction, requiring only one forward and one backward pass. GMPO further employs a generate multi-judge design in which candidate prompt edits are proposed by a generator and selected using cross-entropy losses aggregated from multiple lightweight judge models, reducing evaluator bias and improving generalization. Experiments across math, reasoning, instruction-following evaluation, and safety robustness benchmarks demonstrate consistent gains with substantially lower optimization overhead.

Anthology ID:: 2026.acl-long.1089
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23744–23773
Language:
URL:: https://aclanthology.org/2026.acl-long.1089/
DOI:
Bibkey:
Cite (ACL):: ChenZhuo Zhao, Xinda Wang, Pu Zhao, Yue Huang, Junting Lu, Ziqian Liu, Qingwei Lin, Saravan Rajmohan, and Dongmei Zhang. 2026. Gradient-Guided Multi-Judge Prompt Optimization. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 23744–23773, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Gradient-Guided Multi-Judge Prompt Optimization (Zhao et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1089.pdf
Checklist:: 2026.acl-long.1089.checklist.pdf

PDF Cite Search Checklist Fix data