Xinyu Tang

2025

DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning
Xinyu Tang | Xiaolei Wang | Wayne Xin Zhao | Ji-Rong Wen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Zero-shot in-context learning (ZS-ICL) aims to conduct in-context learning (ICL) without using human-annotated demonstrations.Existing ZS-ICL methods either use large language models (LLMs) to generate (input, label) pairs as pseudo-demonstrations or leverage historical pseudo-demonstrations to help solve the current problem.They assume that all problems are from the same task and traverse them in a random order.However, in real-world scenarios, problems usually come from diverse tasks, and only a few belong to the same task.The random traversing order may generate unreliable pseudo-demonstrations and lead to error accumulation.To address this problem, we reformulate ZS-**ICL** as a planning problem and propose a **D**emonstration-**AW**are Mo**N**te Carlo Tree Search (MCTS) approach (DAWN-ICL), which leverages MCTS to strategically plan the problem-solving trajectories for ZS-ICL.In addition, to achieve effective and efficient Q value estimation, we propose a demonstration-aware Q-value function and use it to enhance the selection phase and accelerate the expansion and simulation phases in MCTS.Extensive experiments demonstrate the effectiveness and efficiency of DAWN-ICL on in-domain and cross-domain scenarios, and it even outperforms ICL using human-annotated demonstrations.The code is available at https://github.com/txy77/MCTS4ZSICL.

pdf bib abs

Recent advancements in long chain-of-thoughts (long CoTs) have significantly improved the reasoning capabilities of large language models (LLMs). Existing work finds that the capability of long CoT reasoning can be efficiently elicited by tuning on only a few examples and can easily transfer to other tasks. This motivates us to investigate whether long CoT reasoning is a general capability for LLMs. In this work, we conduct an empirical analysis for this question from the perspective of representation. We find that LLMs do encode long CoT reasoning as a general capability, with a clear distinction from vanilla CoTs. Furthermore, domain-specific representations are also required for the effective transfer of long CoT reasoning. Inspired by these findings, we propose GLORE, a novel representation engineering method to unleash the general long CoT reasoning capabilities of LLMs. Extensive experiments demonstrate the effectiveness and efficiency of GLORE in both in-domain and cross-domain scenarios. The code is available at https://github.com/txy77/GLoRE.

pdf bib abs

Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices
Congzheng Song | Xinyu Tang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Fine-tuning large language models (LLMs) with backpropagation–even for a subset of parameters such as LoRA–can be much more memory-consuming than inference and is often deemed impractical for resource-constrained mobile devices. Alternative methods, such as zeroth-order optimization (ZO), can greatly reduce the memory footprint but come at the cost of significantly slower model convergence (10× to 100× more steps than backpropagation). We propose a memory-efficient implementation of backpropagation (MeBP) on mobile devices that allows flexible trade-offs between memory usage and compute time, while converging faster and achieving better performance than the ZO baseline. We verify the effectiveness of MeBP on an iPhone 15 Pro Max and show that various LLMs, ranging from 0.5B to 4B parameters, can be fine-tuned using less than 1GB of memory. We release an example of the MeBP implementation at https://github.com/apple/ml-mebp.

2023

pdf bib abs

Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models
Xiaolei Wang | Xinyu Tang | Xin Zhao | Jingyuan Wang | Ji-Rong Wen
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs), which rely on natural language conversations to satisfy user needs. In this paper, we embark on an investigation into the utilization of ChatGPT for CRSs, revealing the inadequacy of the existing evaluation protocol. It might overemphasize the matching with ground-truth items annotated by humans while neglecting the interactive nature of CRSs. To overcome the limitation, we further propose an **i**nteractive **Eva**luation approach based on **L**L**M**s, named **iEvaLM**, which harnesses LLM-based user simulators. Our evaluation approach can simulate various system-user interaction scenarios. Through the experiments on two public CRS datasets, we demonstrate notable improvements compared to the prevailing evaluation protocol. Furthermore, we emphasize the evaluation of explainability, and ChatGPT showcases persuasive explanation generation for its recommendations. Our study contributes to a deeper comprehension of the untapped potential of LLMs for CRSs and provides a more flexible and realistic evaluation approach for future research about LLM-based CRSs.

Co-authors

Venues

Fix author