Jaekyeom Kim
2024
Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Yunxiang Zhang
|
Muhammad Khalifa
|
Lajanugen Logeswaran
|
Jaekyeom Kim
|
Moontae Lee
|
Honglak Lee
|
Lu Wang
Findings of the Association for Computational Linguistics: ACL 2024
Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs), where LLMs refine their solutions using self-generated critiques that pinpoint the errors. This work explores whether small (≤ 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs. We propose a novel pipeline that prompts smaller LMs to collect self-correction data that supports the training of self-refinement abilities. First, we leverage correct solutions to guide the model in critiquing their incorrect responses. Second, the generated critiques, after filtering, are used for supervised fine-tuning of the self-correcting reasoner through solution refinement. Our experimental results show improved self-correction abilities of two models on five datasets spanning math and commonsense reasoning, with notable performance gains when paired with a strong GPT-4-based verifier, though limitations are identified when using a weak self-verifier for determining when to correct.
Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents
Jaekyeom Kim
|
Dong-Ki Kim
|
Lajanugen Logeswaran
|
Sungryull Sohn
|
Honglak Lee
Findings of the Association for Computational Linguistics: EMNLP 2024
In this paper, we introduce Auto-Intent, a method to adapt a pre-trained large language model (LLM) as an agent for a target domain without direct fine-tuning, where we empirically focus on web navigation tasks. Our approach first discovers the underlying intents from target domain demonstrations unsupervisedly, in a highly compact form (up to three words). With the extracted intents, we train our intent predictor to predict the next intent given the agent’s past observations and actions. In particular, we propose a self-exploration approach where top-k probable intent predictions are provided as a hint to the pre-trained LLM agent, which leads to enhanced decision-making capabilities. Auto-Intent substantially improves the performance of GPT-3.5, 4 and Llama-3.1-70B, 405B agents on the large-scale real-website navigation benchmarks from Mind2Web and online navigation tasks from WebArena with its cross-benchmark generalization from Mind2Web.
Search
Fix data
Co-authors
- Honglak Lee 2
- Lajanugen Logeswaran 2
- Muhammad Khalifa 1
- Dong-Ki Kim 1
- Moontae Lee 1
- show all...