SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph

Jiazheng Li; Yawei Wang; Qiaojing Yan; Yijun Tian; Zhichao Xu; Huan Song; Panpan Xu; Lin Lee Cheong

SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph

Jiazheng Li, Yawei Wang, Qiaojing Yan, Yijun Tian, Zhichao Xu, Huan Song, Panpan Xu, Lin Lee Cheong

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities, enabling language agents to excel at single-turn tasks. However, their application to complex, multi-step, and long-horizon tasks remains challenging. While reinforcement learning (RL) offers a promising avenue for addressing these challenges, mainstream approaches typically rely solely on sparse, outcome-based rewards — a limitation that becomes especially problematic for group-based RL algorithms lacking critic models, such as Group Relative Policy Optimization (GRPO). In such methods, uniformly rewarding or penalizing all actions within a trajectory can lead to training instability and suboptimal policies, because beneficial and detrimental actions are often entangled across multi-step interactions. To address this challenge, we propose SALT, a novel and lightweight framework that provides a finer-grained advantage assignment, derived solely from outcome rewards. We achieve this by constructing a graph from trajectories of the same prompt, which allows us to quantify the quality of each step and assign advantages accordingly. Crucially, SALT is designed as a plug-and-play module that seamlessly integrates with existing group-based RL algorithms — requiring no modifications to the rollout procedure and introducing negligible computational overhead. Extensive experiments on the WebShop, ALFWorld, and AppWorld benchmarks with various model sizes demonstrate that SALT consistently improves performance. We also conduct a thorough analysis to validate the design choices behind SALT and offer actionable insights.

Anthology ID:: 2026.findings-eacl.247
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4709–4725
Language:
URL:: https://aclanthology.org/2026.findings-eacl.247/
DOI:
Bibkey:
Cite (ACL):: Jiazheng Li, Yawei Wang, Qiaojing Yan, Yijun Tian, Zhichao Xu, Huan Song, Panpan Xu, and Lin Lee Cheong. 2026. SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph. In Findings of the Association for Computational Linguistics: EACL 2026, pages 4709–4725, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph (Li et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-eacl.247.pdf
Checklist:: 2026.findings-eacl.247.checklist.pdf

PDF Cite Search Checklist Fix data