Yufei Shi

Also published as: YuFei Shi

2026

From Individual Excellence to Collective Sustainability: Seeking Strategic Equilibrium in Proactive Multi-Agent Teams
Tong Zhang | Yang Wu | Yufei Shi | Rujing Yao | Zhuoren Jiang | Xiaozhong Liu
Findings of the Association for Computational Linguistics: ACL 2026

In heterogeneous scientific teams, proactive team agents can serve as effective assistants regarding the research progress of the project. However, proactive agents always suffer from collaborative myopia: a greedy optimization for immediate task accuracy which ignore the long-term goal of team sustainability. This leads to the Individual-centric Trap, where capable experts (e.g., PIs) are disproportionately overloaded while Junior roles remain underutilized. Therefore, neglecting opportunity costs in task allocation can implicitly erodes the enduring performance of the team. To solve this imbalance between efficiency and sustainability, we propose GT-PMARL (Game-Theoretic Proactive Multi-Agent Reinforcement Learning). By internalizing the opportunity cost as a key consideration in individual decision-making, the collaboration logic of agents has been reshaped. Our framework employs: (1) a Positive-Unlabeled scorer to anchor intervention quality under sparse supervision; (2) a Nash-Pareto competitive objective to seek an equilibrium between individual task excellence and collective load balancing. Empirical experiments in scientific workflows show that GT-PMARL effectively maintains high performance while preventing experts from over-developing. Our work provides a scalable paradigm for building a sustainable and balanced human-AI collaborative ecosystem.

pdf bib abs

Cloud-hosted Large Language Models (LLMs) offer unmatched reasoning capabilities and dynamic knowledge, yet submitting raw queries to these external services risks exposing sensitive user intent. Conversely, relying exclusively on trusted local models preserves privacy but often compromises answer quality due to limited parameter scale and knowledge. To resolve this dilemma, we propose Game-theoretic Trustworthy Knowledge Acquisition (GTKA), a framework that formulates the trade-off between knowledge utility and privacy as a strategic game. GTKA consists of three components: (i) a privacy-aware sub-query generator that decomposes sensitive intent into generalized, low-risk fragments; (ii) an adversarial reconstruction attacker that attempts to infer the original query from these fragments, providing adaptive leakage signals; and (iii) a trusted local integrator that synthesizes external responses within a secure boundary. By training the generator and attacker in an alternating adversarial manner, GTKA optimizes the sub-query generation policy to maximize knowledge acquisition accuracy while minimizing the reconstructability of the original sensitive intent. To validate our approach, we construct two sensitive-domain benchmarks in the biomedical and legal fields. Extensive experiments demonstrate that GTKA significantly reduces intent leakage compared to state-of-the-art baselines while maintaining high-fidelity answer quality.

pdf bib abs

We propose UniVocal, a unified framework that implicitly infers vocal modes from text context to pioneer Speech-Singing Code-Switching (SCS) Synthesis—a task where transitions are autonomously driven by textual semantics, akin to seamless human language blending. Unlike single-mode generation or systems relying on switching-control tags, our proposed UniVocal implicitly infers vocal modes solely from text context. To achieve this, we employ a data-efficient two-stage curriculum learning strategy that progressively trains a competitive TTS system to acquire the desired SCS capability. Addressing data scarcity, we introduce a scalable pipeline to synthesize diverse code-switching data that is both semantically and acoustically natural, alongside a new multi-scenario benchmark, SCSBench. To address limitations of semantic tokenizers in capturing acoustic details, we also introduce refined cent token and Chain-of-Thought (CoT) generation for planning prosody before content generation, effectively enhancing empathetic speech generation and singing melody. Experimental results demonstrate that UniVocal achieves state-of-the-art performance on SCSBench while maintaining competitive performance on regular speech and singing tasks. Audio samples are available at https://project-univocal-demo.github.io/demo/. The code and dataset are released at https://github.com/FunAudioLLM/FunResearch/tree/main/UniVocal.

Co-authors

Ang Li 1

Venues

Findings2
ACL1

Fix author