2024
pdf
bib
abs
Teaching Small Language Models Reasoning through Counterfactual Distillation
Tao Feng
|
Yicheng Li
|
Li Chenglin
|
Hao Chen
|
Fei Yu
|
Yin Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
With the rise of large language models (LLMs), many studies are interested in transferring the reasoning capabilities of LLMs to small language models (SLMs). Previous distillation methods usually utilize the capabilities of LLMs to generate chain-of-thought (CoT) samples and teach SLMs via fine-tuning. However, such a standard distillation approach performs poorly when applied to out-of-distribution (OOD) examples, and the diversity of the generated CoT samples is insufficient. In this work, we propose a novel counterfactual distillation framework. Firstly, we leverage LLMs to automatically generate high-quality counterfactual data. Given an input text example, our method generates a counterfactual example that is very similar to the original input, but its task label has been changed to the desired one. Then, we utilize multi-view CoT to enhance the diversity of reasoning samples. Experiments on four NLP benchmarks show that our approach enhances the reasoning capabilities of SLMs and is more robust to OOD data. We also conduct extensive ablations and sample studies to understand the reasoning capabilities of SLMs.
pdf
bib
abs
Mixed Distillation Helps Smaller Language Models Reason Better
Li Chenglin
|
Qianglong Chen
|
Liangyue Li
|
Caiyu Wang
|
Feng Tao
|
Yicheng Li
|
Zulong Chen
|
Yin Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024
As large language models (LLMs) have demonstrated impressive multiple step-by-step reasoning capabilities in recent natural language processing (NLP) reasoning tasks, many studies are interested in distilling reasoning abilities into smaller language models (SLMs) via fine-tuning. Previous distillation methods usually utilize the capabilities of LLMs to generate chain-of-thought (CoT) samples to teach SLMs. However, this distillation approach performs poorly in certain scenarios due to the limitations of CoT. In this work, we introduce a novel Mixed Distillation (MD) framework, distilling multiple step-by-step reasoning abilities into SLMs. First, we leverage LLMs to generate multiple step-by-step reasoning rationales by sampling automatically. Then, we create high-quality, well-balanced mixed thought data and design a novel multi-task loss to help SLMs better learn and adaptively activate multiple step-by-step reasoning. Our extensive experiments demonstrate that MD enhances both single-path (using either CoT or PoT) and multi-path (using both CoT and PoT) reasoning abilities of SLMs during inference across reasoning tasks. Notably, a single model generated by MD exceeds the comprehensive performance of an ensemble of two individual CoT and PoT distilled models. Mistral-7B using MD can achieve remarkable improvements of 87.5%, 74.0% and 77.1% on SVAMP, GSM8K and ASDIV, respectively, outperforming the teacher model, GPT-3.5-Turbo. We hope our work provides insight into SLMs’ multiple step-by-step reasoning abilities.
pdf
bib
abs
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search
Li Chenglin
|
Qianglong Chen
|
Zhi Li
|
FengTao FengTao
|
Yicheng Li
|
Hao Chen
|
Fei Yu
|
Yin Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024
Instruction tuning is a crucial technique for aligning language models with humans’ actual goals in the real world. Extensive research has highlighted the quality of instruction data is essential for the success of this alignment. However, creating high-quality data manually is labor-intensive and time-consuming, which leads researchers to explore using LLMs to synthesize data. Recent studies have focused on using a stronger LLM to iteratively enhance existing instruction data, showing promising results. Nevertheless, previous work often lacks control over the evolution direction, resulting in high uncertainty in the data synthesis process and low-quality instructions. In this paper, we introduce a general and scalable framework, IDEA-MCTS (Instruction Data Enhancement using Monte Carlo Tree Search), a scalable framework for efficiently synthesizing instructions. With tree search and evaluation models, it can efficiently guide each instruction to evolve into a high-quality form, aiding in instruction fine-tuning. Experimental results show that IDEA-MCTS significantly enhances the seed instruction data, raising the average evaluation scores of quality, diversity, and complexity from 2.19 to 3.81. Furthermore, in open-domain benchmarks, experimental results show that IDEA-MCTS improves the accuracy of real-world instruction-following skills in LLMs by an average of 5% in low-resource settings.
pdf
bib
abs
Towards Autonomous Tool Utilization in Language Models: A Unified, Efficient and Scalable Framework
Zhi Li
|
Yicheng Li
|
Hequan Ye
|
Yin Zhang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In recent research, significant advancements have been achieved in tool learning for large language models. Looking towards future advanced studies, the issue of fully autonomous tool utilization is particularly intriguing: given only a query, language models can autonomously decide whether to employ a tool, which specific tool to select, and how to utilize these tools, all without needing any tool-specific prompts within the context. To achieve this, we introduce a unified, efficient, and scalable framework for fine-tuning language models. Based on the degree of tool dependency, we initially categorize queries into three distinct types. By transforming the entire process into a sequential decision-making problem through conditional probability decomposition, our approach unifies the three types and autoregressively generates decision processes. Concurrently, we’ve introduced an “instruct, execute, and reformat” strategy specifically designed for efficient data annotation. Through end-to-end training on the annotated dataset comprising 26 diverse APIs, the model demonstrates a level of self-awareness, automatically seeking tool assistance when necessary. It significantly surpasses original instruction-tuned open-source language models and GPT-3.5/4 on multiple evaluation metrics. To address real-world scalability needs, we’ve enhanced our framework with a dynamic rehearsal strategy for continual learning, proven to require minimal new annotations to exhibit remarkable performance.