GLaPE: Gold Label-agnostic Prompt Evaluation for Large Language Models

Xuanchang Zhang; Zhuosheng Zhang; Hai Zhao

doi:10.18653/v1/2024.emnlp-main.121

GLaPE: Gold Label-agnostic Prompt Evaluation for Large Language Models

Xuanchang Zhang, Zhuosheng Zhang, Hai Zhao

Abstract

Despite the rapid progress of large language models (LLMs), their task performance remains sensitive to prompt design. Recent studies have explored leveraging the LLM itself as an optimizer to identify optimal prompts that maximize task accuracy. However, when evaluating prompts, such approaches heavily rely on elusive manually annotated gold labels to calculate task accuracy for each candidate prompt, which hinders its generality. To overcome the limitation, this work proposes GLaPE, a gold label-agnostic prompt evaluation method to alleviate dependence on gold labels. GLaPE is composed of two critical aspects: self-consistency evaluation of a single prompt and mutual-consistency refinement across multiple prompts. Experimental results on 8 widely-recognized reasoning tasks demonstrate that GLaPE can produce more effective prompts, achieving performance comparable to those derived from manually annotated gold labels. Analysis shows that GLaPE provides reliable evaluations aligned with accuracy, even in the absence of gold labels. Code is publicly available at **Anonymous**.

Anthology ID:: 2024.emnlp-main.121
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2027–2039
Language:
URL:: https://aclanthology.org/2024.emnlp-main.121/
DOI:: 10.18653/v1/2024.emnlp-main.121
Bibkey:
Cite (ACL):: Xuanchang Zhang, Zhuosheng Zhang, and Hai Zhao. 2024. GLaPE: Gold Label-agnostic Prompt Evaluation for Large Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 2027–2039, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: GLaPE: Gold Label-agnostic Prompt Evaluation for Large Language Models (Zhang et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.121.pdf

PDF Cite Search Fix data