Mao Keyu
Also published as: Keyu Mao
2026
BiCSRouter: Bi-Level Cross-System Routing for Utility-Aware LLM Inference
Mao Keyu | Eiki Murata | Ukyo Honda
Findings of the Association for Computational Linguistics: ACL 2026
Mao Keyu | Eiki Murata | Ukyo Honda
Findings of the Association for Computational Linguistics: ACL 2026
Selecting an appropriate LLM configuration for a given query is critical, yet existing routing frameworks operate within a single computational paradigm. To address this gap, we formalize the Cross-System Routing Problem, a hierarchical decision-making task that decomposes routing into intra-regime configuration selection and inter-regime system selection. Building on this, we propose BiCSRouter, a bi-level cross-system routing framework that integrates two orthogonal regimes: intensive reasoning via single-agent systems and extensive collaboration via multi-agent systems. BiCSRouter performs policy learning within each system and employs a lightweight inter-regime router that selects the optimal regime based on predicted performance and cost. Experiments on the MBPP and MATH benchmarks demonstrate that BiCSRouter outperforms 15 representative baselines across three types. On MBPP, compared to the performance ceiling of GPT-5, BiCSRouter achieves a 46% reduction in cost with only a 2% drop in accuracy. Finally, we show that BiCSRouter can extend to additional regimes, highlighting its generality as a cross-system routing framework.
Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders
Ailiang Lin | Zhuoyun Li | Keyu Mao | Kotaro Funakoshi | Manabu Okumura
Findings of the Association for Computational Linguistics: ACL 2026
Ailiang Lin | Zhuoyun Li | Keyu Mao | Kotaro Funakoshi | Manabu Okumura
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) have been widely explored for embedding generation. While recent studies show that in-context learning (ICL) effectively enhances the representational capability of LLMs by prepending a few task-related demonstrations, it causes substantial token overhead due to the increased sequence length. In this work, we propose EPIC, a novel embedding-based in-context prompt training strategy that leverages ICL to generate high-quality embeddings while reducing computational burden during both training and inference. This approach replaces discrete text demonstrations with their corresponding continuous embeddings, which not only encourages the LLM to align semantically-related text pairs during contrastive learning, but also requires the model to interpret demonstration embeddings as part of the in-context prompt. Consequently, EPIC-trained models achieve excellent embedding performance both with or without in-context prompts at inference time. Comprehensive experiments demonstrate that our method establishes new state-of-the-art results on the MTEB benchmark, surpassing frontier models trained solely on publicly available retrieval data. Extensive ablation studies further validate the effectiveness and necessity of our mechanism.