Manying Zhang


2025

Large language models (LLMs) have made significant advancements, but their increasing capabilities present serious risks of misuse, particularly in open-weight models where direct access to the model’s parameters is possible. Current safeguards, designed for closed-weight API models, are inadequate for open-weight models, as minimal fine-tuning can bypass these protections. Preserving the integrity of open-weight LLMs before deployment has thus become a critical challenge. We argue that these vulnerabilities stem from the overemphasis on maximizing the LLM’s log-likelihood during training, which amplifies data biases, especially with large datasets. To address these issues, we introduce Kahneman and Tversky’s Prospect Theoretic Integrity Preserving Alignment (KT-IPA), a framework that prioritizes maximizing generative utility rather than a singular optimization metric. This approach strengthens LLMs against misuse and weaponization while maintaining high performance, even after extensive fine-tuning. Our results demonstrate that integrating prospect theory into LLM training enhances robustness, security, and responsible innovation in this rapidly evolving field. Our codes are available on https://anonymous.4open.science/r/KT-IPA-40B7
Accurate and personalized product recommendation is central to user satisfaction in e-commerce. However, a persistent language gap often exists between user queries and product titles or descriptions. While traditional user behavior-based recommenders and LLM-based Retrieval-Augmented Generation systems typically optimize for maximum likelihood objectives, they may struggle to bridge this gap or capture users’ true intent. In this paper, we propose a strategy based on Prospect Theoretic Self-Alignment, that reframes LLM-based recommendations as a utility-driven process. Given a user query and a set of candidate products, our model acts as a seller who anticipates latent user needs and generates product descriptions tailored to the user’s perspective. Simultaneously, it simulates user decision-making utility to assess whether the generated content would lead to a purchase. This self-alignment is achieved through a training strategy grounded in Kahneman & Tversky’s prospect theory, ensuring that recommendations are optimized for perceived user value rather than likelihood alone. Experiments on real-world product data demonstrate substantial improvements in intent alignment and recommendation quality, validating the effectiveness of our approach in producing personalized and decision-aware recommendations.