Fereshte Khani


2024

pdf bib
Prompt Engineering a Prompt Engineer
Qinyuan Ye | Mohamed Ahmed | Reid Pryzant | Fereshte Khani
Findings of the Association for Computational Linguistics: ACL 2024

Prompt engineering is a challenging yet crucial task for optimizing the performance of large language models on customized tasks. It requires complex reasoning to examine the model’s errors, hypothesize what is missing or misleading in the current prompt, and communicate the task with clarity. While recent works indicate that large language models can be meta-prompted to perform automatic prompt engineering, we argue that their potential is limited due to insufficient guidance for complex reasoning in the meta-prompt. We fill this gap by infusing into the meta-prompt three key components: detailed descriptions, context specification, and a step-by-step reasoning template. The resulting method, named PE2, showcases remarkable versatility across diverse language tasks. It finds prompts that outperform “let’s think step by step” by 6.3% on MultiArith and 3.1% on GSM8K, and outperforms competitive baselines on counterfactual tasks by 6.9%. Further, we show that PE2 can make targeted prompt edits, rectify erroneous prompts, and induce multi-step plans for complex tasks.

2023

pdf bib
Targeted Data Generation: Finding and Fixing Model Weaknesses
Zexue He | Marco Tulio Ribeiro | Fereshte Khani
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Even when aggregate accuracy is high, state-of-the-art NLP models often fail systematically on specific subgroups of data, resulting in unfair outcomes and eroding user trust. Additional data collection may not help in addressing these weaknesses, as such challenging subgroups may be unknown to users, and underrepresented in the existing and new data. We propose Targeted Data Generation (TDG), a framework that automatically identifies challenging subgroups, and generates new data for those subgroups using large language models (LLMs) with a human in the loop. TDG estimates the expected benefit and potential harm of data augmentation for each subgroup, and selects the ones most likely to improve within-group performance without hurting overall performance. In our experiments, TDG significantly improves the accuracy on challenging subgroups for state-of-the-art sentiment analysis and natural language inference models, while also improving overall test accuracy.

2018

pdf bib
Planning, Inference and Pragmatics in Sequential Language Games
Fereshte Khani | Noah D. Goodman | Percy Liang
Transactions of the Association for Computational Linguistics, Volume 6

We study sequential language games in which two players, each with private information, communicate to achieve a common goal. In such games, a successful player must (i) infer the partner’s private information from the partner’s messages, (ii) generate messages that are most likely to help with the goal, and (iii) reason pragmatically about the partner’s strategy. We propose a model that captures all three characteristics and demonstrate their importance in capturing human behavior on a new goal-oriented dataset we collected using crowdsourcing.

2016

pdf bib
Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings
Fereshte Khani | Martin Rinard | Percy Liang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)