Xiao Sun
Papers on this page may belong to the following people: Xiao Sun, Xiao Sun
2026
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
Tianyu Liu | Qitan Lv | Hao Li | Xing Gao | Xiao Sun | Xiaoyan Sun
Findings of the Association for Computational Linguistics: ACL 2026
Tianyu Liu | Qitan Lv | Hao Li | Xing Gao | Xiao Sun | Xiaoyan Sun
Findings of the Association for Computational Linguistics: ACL 2026
Speculative decoding (SD), where a small draft model is employed to propose *draft* tokens in advance and then the target model validates them in parallel, has emerged as a promising technique for LLM inference acceleration. Many endeavors to improve SD are to eliminate the need for a draft model and generate draft tokens in a retrieval-based manner in order to further alleviate the drafting overhead and significantly reduce the difficulty in deployment and applications. However, retrieval-based SD relies on a matching paradigm to retrieve the most relevant reference as the draft tokens, where these methods often fail to find matched and accurate draft tokens. To address this challenge, we propose *LogitSpec* to effectively expand the retrieval range and find the most relevant reference as drafts. *LogitSpec* is motivated by the observation that the logit of the last token can not only predict **the next token**, but also speculate **the next next token**. Specifically, *LogitSpec* generates draft tokens in two steps: (1) utilizing the last logit to speculate the next next token; (2) retrieving relevant reference for both the next token and the next next token. *LogitSpec* is training-free and plug-and-play, which can be easily integrated into existing LLM inference frameworks. Extensive experiments on a wide range of text generation benchmarks demonstrate that *LogitSpec* can achieve up to 2.61× speedup and 3.28 mean accepted tokens per decoding step.
2018
A Syntactically Constrained Bidirectional-Asynchronous Approach for Emotional Conversation Generation
Jingyuan Li | Xiao Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Jingyuan Li | Xiao Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Traditional neural language models tend to generate generic replies with poor logic and no emotion. In this paper, a syntactically constrained bidirectional-asynchronous approach for emotional conversation generation (E-SCBA) is proposed to address this issue. In our model, pre-generated emotion keywords and topic keywords are asynchronously introduced into the process of decoding. It is much different from most existing methods which generate replies from the first word to the last. Through experiments, the results indicate that our approach not only improves the diversity of replies, but gains a boost on both logic and emotion compared with baselines.
2014
Real Time Early-stage Influenza Detection with Emotion Factors from Sina Microblog
Xiao Sun | Jiaqi Ye | Fuji Ren
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing
Xiao Sun | Jiaqi Ye | Fuji Ren
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing
2012
A MMSM-based Hybrid Method for Chinese MicroBlog Word Segmentation
Xiao Sun | Chengcheng Li | Chenyi Tang | Jiaqi Ye
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing
Xiao Sun | Chengcheng Li | Chenyi Tang | Jiaqi Ye
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing