Zhengyang Zhou
2026
TSPO: Breaking the Double Homogenization Dilemma in Multi-turn Search Policy Optimization
Shichao Ma | Zhiyuan Ma | Ming Yang | Xiaofan Li | Xing Wu | Jintao Du | Yu Cheng | Weiqiang Wang | Qiliang Liu | Zhengyang Zhou | Yang Wang
Findings of the Association for Computational Linguistics: ACL 2026
Shichao Ma | Zhiyuan Ma | Ming Yang | Xiaofan Li | Xing Wu | Jintao Du | Yu Cheng | Weiqiang Wang | Qiliang Liu | Zhengyang Zhou | Yang Wang
Findings of the Association for Computational Linguistics: ACL 2026
Multi-turn tool-integrated reasoning enables Large Language Models (LLMs) to solve complex tasks through iterative information retrieval. However, current reinforcement learning (RL) frameworks for search-augmented reasoning predominantly rely on sparse outcome-level rewards, leading to a "Double Homogenization Dilemma." This manifests as (1) Process homogenization, where the thinking, reasoning, and tooling involved in generation are ignored. (2) Intra-group homogenization, coarse-grained outcome rewards often lead to inefficiencies in intra-group advantage estimation with methods like Group Relative Policy Optimization (GRPO) during sampling. To address this, we propose Turn-level Stage-aware Policy Optimization (TSPO). TSPO introduces the First-Occurrence Latent Reward (FOLR) mechanism, allocating partial rewards to the step where the ground-truth answer first appears, thereby preserving process-level signals and increasing reward variance within groups without requiring external reward models or any annotations. Extensive experiments demonstrate that TSPO significantly outperforms state-of-the-art baselines, achieving average performance gains of 24% and 13.6% on Qwen2.5-3B and 7B models, respectively. Code is available at https://github.com/Flipped-May/TSPO.
Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models
Zhiqing Cui | Binwu Wang | Qingxiang Liu | Yeqiang Wang | Zhengyang Zhou | Yuxuan Liang | Yang Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiqing Cui | Binwu Wang | Qingxiang Liu | Yeqiang Wang | Zhengyang Zhou | Yuxuan Liang | Yang Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLM) have emerged as a promising avenue for time series forecasting, offering the potential to integrate multimodal data. However, existing LLM-based approaches face notable limitations—such as marginalized role in model architectures, reliance on coarse statistical text prompts, and lack of interpretability. In this work, we introduce Augur, a fully LLM driven time series forecasting framework that exploits LLM causal reasoning to discover and use directed causal associations among covariates. Augur uses a two stage teacher student architecture where a powerful teacher LLM infers a directed causal graph from time series using heuristic search together with pairwise causality testing. A lightweight student agent then refines the graph and fine tune on high confidence causal associations that are encoded as rich textual prompts to perform forecasting. This design improves predictive accuracy while yielding transparent, traceable reasoning about variable interactions. Extensive experiments on real-world datasets with 25 baselines demonstrate that Augur achieves competitive performance and robust zero-shot generalization.