Chengrui Huang
2025
TTPA: Token-level Tool-use Preference Alignment Training Framework with Fine-grained Evaluation
Chengrui Huang
|
Shen Gao
|
Zhengliang Shi
|
Dongsheng Wang
|
Shuo Shang
Findings of the Association for Computational Linguistics: EMNLP 2025
Existing tool-learning methods usually rely on supervised fine-tuning, they often overlook fine-grained optimization of internal tool call details, leading to limitations in preference alignment and error discrimination. To overcome these challenges, we propose **T**oken-level **T**ool-use **P**reference **A**lignment Training Framework (TTPA), a training paradigm for constructing token-level tool-use preference datasets that align LLMs with fine-grained preferences using a novel error-oriented scoring mechanism. TTPA first introduces reversed dataset construction, a method for creating high-quality, multi-turn tool-use datasets by reversing the generation flow. Additionally, we propose _Preference Oriented Tool-use Dataset Construction_ to capture fine-grained preferences by modeling token-level differences during generation. To address biases in scoring, we introduce the _Error-oriented Scoring Mechanism_, which quantifies tool-call errors and can be used as a training signal. Extensive experiments on three diverse benchmark datasets demonstrate that TTPA significantly improves tool-using performance while showing strong generalization ability across models and datasets.
2024
360∘REA: Towards A Reusable Experience Accumulation with 360∘ Assessment for Multi-Agent System
Shen Gao
|
Hao Li
|
Zhengliang Shi
|
Chengrui Huang
|
Quan Tu
|
Shuo Shang
|
Zhiliang Tian
|
Minlie Huang
Findings of the Association for Computational Linguistics: ACL 2024
Search
Fix author
Co-authors
- Shen Gao 2
- Shuo Shang 2
- Zhengliang Shi 2
- Minlie Huang 1
- Hao Li (李浩) 1
- show all...