Sequential decision-making refers to algorithms that take into account the dynamics of the environment, where early decisions affect subsequent decisions. With large language models (LLMs) demonstrating powerful capabilities between tasks, we can’t help but ask: Can Current LLMs Effectively Make Sequential Decisions? In order to answer this question, we propose the UNO Arena based on the card game UNO to evaluate the sequential decision-making capability of LLMs and explain in detail why we choose UNO. In UNO Arena, We evaluate the sequential decision-making capability of LLMs dynamically with novel metrics based Monte Carlo methods. We set up random players, DQN-based reinforcement learning players, and LLM players (e.g. GPT-4, Gemini-pro) for comparison testing. Furthermore, in order to improve the sequential decision-making capability of LLMs, we propose the TUTRI player, which can involves having LLMs reflect their own actions with the summary of game history and the game strategy. Numerous experiments demonstrate that the TUTRI player achieves a notable breakthrough in the performance of sequential decision-making compared to the vanilla LLM player.
While large language models (LLMs) excel in many domains, their complexity and scale challenge deployment in resource-limited environments. Current compression techniques, such as parameter pruning, often fail to effectively utilize the knowledge from pruned parameters. To address these challenges, we propose Manifold-Based Knowledge Alignment and Layer Merging Compression (MKA), a novel approach that uses manifold learning and the Information Bottleneck (IB) measure to merge similar layers, reducing model size while preserving essential performance. We evaluate MKA on multiple benchmark datasets and various LLMs. Our findings show that MKA not only preserves model performance but also achieves substantial compression ratios, outperforming traditional pruning methods. Moreover, when coupled with quantization, MKA delivers even greater compression. Specifically, on the MMLU dataset using the Llama3-8B model, MKA achieves a compression ratio of 43.75% with a minimal performance decrease of only 2.82%. The proposed MKA method offers a resource-efficient and performance-preserving model compression technique for LLMs. We make our code available at https://github.com/SempraETY/Pruning-via-Merging
Chain-of-Thought (CoT) prompting combined with large language models (LLM) has shown great potential in improving performance on challenging reasoning tasks. While understanding why CoT prompting is effective is crucial for the application and improvement of CoT prompting, few studies have addressed this issue. Besides, almost no prior work has conducted theoretical analysis on CoT prompting in the context of black-box models. In this paper, we approach the analysis of CoT prompting in black-box LLMs from an information-theoretic perspective. Specifically, we propose a new metric, EPVI (Estimated Pointwise V-Information), which extends the concept of pointwise V-information to black-box models, quantifying the label-relevant new information introduced by CoT prompting beyond the pre-existing information in the input. Based on this, we conduct a series of experiments at both the task and instance levels to analyze CoT prompting, demonstrating that the effectiveness of CoT prompting can be attributed to its capacity to influence the difficulty of model inference by augmenting or reducing the model-usable information. Furthermore, we show that selecting high-quality demonstrations of CoT reasoning based on EPVI can improve the downstream performance of reasoning tasks.