Zhiyuan Yao


2026

As Large Language Models (LLMs) evolve from static dialogue interfaces to autonomous general agents, effective memory is paramount to ensuring long-term consistency. However, existing benchmarks primarily focus on casual conversation or task-oriented dialogue, failing to capture “long-term project-oriented” interactions where agents must track evolving goals. To bridge this gap, we introduce RealMem, the first benchmark grounded in realistic project scenarios. RealMem comprises over 2,000 cross-session dialogues across eleven scenarios, utilizing natural user queries for evaluation. We propose a synthesis pipeline that integrates Project Foundation Construction, Multi-Agent Dialogue Generation, and Memory and Schedule Management to simulate the dynamic evolution of memory. Experiments reveal that current memory systems face significant challenges in managing the long-term project states and dynamic context dependencies inherent in real-world projects. Our code and datasets are available at https://anonymous.4open.science/r/realmem-A1E4.
With the rise of the Agent Web and Model Context Protocol (MCP), the agent ecosystem is evolving into an open collaborative network, exponentially increasing accessible tools. However, current architectures face severe scalability and generality bottlenecks. To address this, we propose ACE-Router, a pipeline for training history-aware routers to empower precise navigation in large-scale ecosystems. By leveraging a dependency-rich candidate Graph to synthesize multi-turn trajectories, we effectively train routers with dynamic context understanding to create the plug-and-play Light Routing Agent. Experiments on the real-world benchmarks MCP-Universe and MCP-Mark demonstrate superior performance. Notably, ACE-Router exhibits critical properties for the future Agent Web: it not only generalizes to multi-agent collaboration with minimal adaptation but also maintains exceptional robustness against noise and scales effectively to massive candidate spaces. These findings provide a strong empirical foundation for universal orchestration in open-ended ecosystems.Our code is available at https://github.com/euyis1019/ACE-Router.

2025

Despite the promise of large language models based agent framework in stock trading task, their capabilities for comprehensive analysis and multiple different financial assets remain largely unexplored, such as cryptocurrency trading. To evaluate the capabilities of LLM-based agent framework in cryptocurrency trading, we introduce an LLMs-based financial shared task featured at COLING 2025 FinNLP-FNP-LLMFinLegal workshop, named Agent-based Single Cryptocurrency Trading Challenge. This challenge includes two cryptocurrencies: BitCoin and Ethereum. In this paper, we provide an overview of these tasks and datasets, summarize participants’ methods, and present their experimental evaluations, highlighting the effectiveness of LLMs in addressing cryptocurrency trading challenges. To the best of our knowledge, the Agent-based Single Cryptocurrency Trading Challenge is one of the first challenges for assessing LLMs in the financial area. In consequence, we provide detailed observations and take away conclusions for future development in this area.
"本文介绍了我们在第二十四届中文计算语言学大会细粒度中文仇恨言论识别任务中的参赛系统。该任务要求构建结构化仇恨四元组(评论对象、论点、目标群体、是否仇恨),提升模型的细粒度检测与可解释性。我们基于大语言模型,首先评估了LoRA参数高效微调效果,优化了超参数配置;其次对标注数据进行结构化处理,增强数据规范性;最后优化提示词设计,引导模型生成准确的结构化输出。实验表明,三阶段优化提升了模型性能。"

2024