Xiaoyuan Yi - ACL Anthology

Xiaoyuan Yi

2026

Generative Personality Simulation via Theory-Informed Structured Interview
Pengda Wang | Huiqi Zou | Han Jiang | Hanjie Chen | Tianjun Sun | Xiaoyuan Yi | Ziang Xiao | Frederick L. Oswald
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite their potential as human proxies, LLMs often fail to generate heterogeneous data with human-like diversity, thereby diminishing their value in advancing social science research. To address this gap, we propose a novel method to incorporate psychological insights into LLM simulation through the Personality Structured Interview (PSI). PSI leverages psychometric scale-development procedures to capture personality-related linguistic information from a formal psychological perspective. To systematically evaluate simulation fidelity, we developed a measurement theory grounded evaluation procedure that considers the latent construct nature of personality and evaluates its reliability, structural validity, and external validity. Results from three experiments demonstrate that PSI effectively improves human-like heterogeneity in LLM-simulated personality data and predicts personality-related behavioral outcomes. We further offer a theoretical framework for designing theory-informed structured interviews to enhance the reliability and effectiveness of LLMs in simulating human-like data for broader psychometric research.

2025

Towards Better Value Principles for Large Language Model Alignment: A Systematic Evaluation and Enhancement
Bingbing Xu | Jing Yao | Xiaoyuan Yi | Aishan Maoliniyazi | Xing Xie | Xiaofeng Meng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As Large Language Models (LLMs) advance, aligning them with human values is critical for their responsible development. Value principles serve as the foundation for clarifying alignment goals.Multiple sets of value principles have been proposed, such as HHH (helpful, honest, harmless) and instructions for data synthesis in reinforcement learning from AI feedback (RLAIF). However, most of them are heuristically crafted, without consideration of three primary challenges in practical LLM alignment: 1) Comprehensiveness to deal with diverse and even unforeseen scenarios in which LLMs could be applied; 2) Precision to provide LLMs with clear and actionable guidance in specific scenarios; and 3) Compatability to avoid internal contracts between principles.In this paper, we formalize quantitative metrics to evaluate value principles along the three desirable properties. Building on these metrics, we propose the Hierarchical Value Principle framework (HiVaP), which constructs a hierarchical principle set and retrieves principles tailored to each scenario in a cascading way, addressing above challenges.Experimental results validate that the three metrics capture the effectiveness of value principles for LLM alignment, and our HiVaP framework that enhances these metrics leads to superior alignment. Warning: This paper contains several toxic and offensive statements.

Unintended Harms of Value-Aligned LLMs: Psychological and Empirical Insights
Sooyung Choi | Jaehyeok Lee | Xiaoyuan Yi | Jing Yao | Xing Xie | JinYeong Bak
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The application scope of Large Language Models (LLMs) continues to expand, leading to increasing interest in personalized LLMs that align with human values. However, aligning these models with individual values raises significant safety concerns, as certain values may correlate with harmful information. In this paper, we identify specific safety risks associated with value-aligned LLMs and investigate the psychological principles behind these challenges. Our findings reveal two key insights. (1) Value-aligned LLMs are more prone to harmful behavior compared to non-fine-tuned models and exhibit slightly higher risks in traditional safety evaluations than other fine-tuned models. (2) These safety issues arise because value-aligned LLMs genuinely generate text according to the aligned values, which can amplify harmful outcomes. Using a dataset with detailed safety categories, we find significant correlations between value alignment and safety risks, supported by psychological hypotheses. This study offers insights into the “black box” of value alignment and proposes in-context alignment methods to enhance the safety of value-aligned LLMs.

Value Compass Benchmarks: A Comprehensive, Generative and Self-Evolving Platform for LLMs’ Value Evaluation
Jing Yao | Xiaoyuan Yi | Shitong Duan | Jindong Wang | Yuzhuo Bai | Muhua Huang | Yang Ou | Scarlett Li | Peng Zhang | Tun Lu | Zhicheng Dou | Maosong Sun | James Evans | Xing Xie
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

As large language models (LLMs) are gradually integrated into human daily life, assessing their underlying values becomes essential for understanding their risks and alignment with specific preferences. Despite growing efforts, current value evaluation methods face two key challenges. C1. Evaluation Validity: Static benchmarks fail to reflect intended values or yield informative results due to data contamination or a ceiling effect. C2. Result Interpretation: They typically reduce the pluralistic and often incommensurable values to one-dimensional scores, which hinders users from gaining meaningful insights and guidance. To address these challenges, we present Value Compass Benchmarks, the first dynamic, online and interactive platform specially devised for comprehensive value diagnosis of LLMs. It (1) grounds evaluations in multiple basic value systems from social science; (2) develops a generative evolving evaluation paradigm that automatically creates real-world test items co-evolving with ever-advancing LLMs; (3) offers multi-faceted result interpretation, including (i) fine-grained scores and case studies across 27 value dimensions for 33 leading LLMs, (ii) customized comparisons, and (iii) visualized analysis of LLMs’ alignment with cultural values. We hope Value Compass Benchmarks serves as a navigator for further enhancing LLMs’ safety and alignment, benefiting their responsible and adaptive development.

MoVa: Towards Generalizable Classification of Human Morals and Values
Ziyu Chen | Junfei Sun | Chenxi Li | Tuan Dung Nguyen | Jing Yao | Xiaoyuan Yi | Xing Xie | Chenhao Tan | Lexing Xie
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Identifying human morals and values embedded in language is essential to empirical studies of communication. However, researchers often face substantial difficulty navigating the diversity of theoretical frameworks and data available for their analysis. Here, we contribute MoVa, a well-documented suite of resources for generalizable classification of human morals and values, consisting of (1) 16 labeled datasets and benchmarking results from four theoretically-grounded frameworks; (2) a lightweight LLM prompting strategy that outperforms fine-tuned models across multiple domains and frameworks; and (3) a new application that helps evaluate psychological surveys. In practice, we specifically recommend a classification strategy, all@once, that scores all related concepts simultaneously, resembling the well-known multi-label classifier chain. The data and methods in MoVa can facilitate many fine-grained interpretations of human and machine communication, with potential implications for the alignment of machine behavior.

MotiveBench: How Far Are We From Human-Like Motivational Reasoning in Large Language Models?
Xixian Yong | Jianxun Lian | Xiaoyuan Yi | Xiao Zhou | Xing Xie
Findings of the Association for Computational Linguistics: ACL 2025

Large language models (LLMs) have been widely adopted as the core of agent frameworks in various scenarios, such as social simulations and AI companions. However, the extent to which they can replicate human-like motivations remains an underexplored question. Existing benchmarks are constrained by simplistic scenarios and the absence of character identities, resulting in an information asymmetry with real-world situations. To address this gap, we propose MotiveBench, which consists of 200 rich contextual scenarios and 600 reasoning tasks covering multiple levels of motivation. Using MotiveBench, we conduct extensive experiments on seven popular model families, comparing different scales and versions within each family. The results show that even the most advanced LLMs still fall short in achieving human-like motivational reasoning. Our analysis reveals key findings, including the difficulty LLMs face in reasoning about “love & belonging” motivations and their tendency toward excessive rationality and idealism. These insights highlight a promising direction for future research on the humanization of LLMs.

2024

CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models
Yuhang Wang | Yanxu Zhu | Chao Kong | Shuyu Wei | Xiaoyuan Yi | Xing Xie | Jitao Sang
Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP

As the scaling of Large Language Models (LLMs) has dramatically enhanced their capabilities, there has been a growing focus on the alignment problem to ensure their responsible and ethical use. While existing alignment efforts predominantly concentrate on universal values such as the HHH principle, the aspect of culture, which is inherently pluralistic and diverse, has not received adequate attention. This work introduces a new benchmark, CDEval, aimed at evaluating the cultural dimensions of LLMs. CDEval is constructed by incorporating both GPT-4’s automated generation and human verification, covering six cultural dimensions across seven domains. Our comprehensive experiments provide intriguing insights into the culture of mainstream LLMs, highlighting both consistencies and variations across different dimensions and domains. The findings underscore the importance of integrating cultural considerations in LLM development, particularly for applications in diverse cultural settings. This benchmark serves as a valuable resource for cultural studies in LLMs, paving the way for more culturally aware and sensitive models.

Negating Negatives: Alignment with Human Negative Samples via Distributional Dispreference Optimization
Shitong Duan | Xiaoyuan Yi | Peng Zhang | Yan Liu | Zheng Liu | Tun Lu | Xing Xie | Ning Gu
Findings of the Association for Computational Linguistics: EMNLP 2024

Large language models (LLMs) have revolutionized the role of AI, yet pose potential social risks. To steer LLMs towards human preference, alignment technologies have been introduced and gained increasing attention. Nevertheless, existing methods heavily rely on high-quality positive-negative training pairs, suffering from noisy positive responses that are barely distinguishable from negative ones. Given recent LLMs’ proficiency in generating helpful responses, this work pivots towards a new research question: **can we achieve alignment using solely human-annotated negative samples, preserving helpfulness while reducing harmfulness?** For this purpose, we propose Distributional Dispreference Optimization (D²O), which maximizes the discrepancy between dispreferred responses and the generated non-negative ones. In this way, D²O effectively eschews harmful information without incorporating noisy positive samples, while avoiding collapse using self-generated responses as anchors. We demonstrate that D²O can be regarded as learning a distributional preference model reflecting human dispreference against negative responses, which is theoretically an upper bound of the instance-level DPO. Extensive experiments manifest that our method achieves comparable generation quality and surpasses the latest strong baselines in producing less harmful and more informative responses with better training stability and faster convergence.

Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Value
Jing Yao | Xiaoyuan Yi | Yifan Gong | Xiting Wang | Xing Xie
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Value alignment is crucial for the responsible development of Large Language Models (LLMs). However, how to define values in this context remains largely unexplored. Existing work mainly specifies values as risk criteria formulated in the AI community, e.g., fairness and privacy protection, suffering from poor clarity, adaptability and transparency. Leveraging basic values established in humanity and social science that are compatible with values across cultures, this paper introduces a novel value space spanned by multiple basic value dimensions and proposes BaseAlign, a corresponding value alignment paradigm. Applying the representative Schwartz’s Theory of Basic Values as an instantiation, we construct FULCRA, a dataset consisting of 20k (LLM output, value vector) pairs. LLMs’ outputs are mapped into the K-dim value space beyond simple binary labels, by identifying their underlying priorities for these value dimensions. Extensive analysis and experiments on FULCRA: (1) reveal the essential relation between basic values and LLMs’ behaviors, (2) demonstrate that our paradigm with basic values not only covers existing risks but also anticipates the unidentified ones, and (3) manifest BaseAlign’s superiority in alignment performance with less data, paving the way for addressing the above three challenges.

2023

DuNST: Dual Noisy Self Training for Semi-Supervised Controllable Text Generation
Yuxi Feng | Xiaoyuan Yi | Xiting Wang | Laks Lakshmanan, V.S. | Xing Xie
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self-training (ST) has prospered again in language understanding by augmenting the fine-tuning of big pre-trained models when labeled data is insufficient. However, it remains challenging to incorporate ST into attribute-controllable language generation. Augmented only by self-generated pseudo text, generation models over-exploit the previously learned text space and fail to explore a larger one, suffering from a restricted generalization boundary and limited controllability. In this work, we propose DuNST, a novel ST framework to tackle these problems. DuNST jointly models text generation and classification as a dual process and further perturbs and escapes from the collapsed space by adding two kinds of flexible noise. In this way, our model could construct and utilize both pseudo text generated from given labels and pseudo labels predicted from available unlabeled text, which are gradually refined during the ST phase. We theoretically demonstrate that DuNST can be regarded as enhancing the exploration of the potentially larger real text space while maintaining exploitation, guaranteeing improved performance. Experiments on three controllable generation tasks show that DuNST significantly boosts control accuracy with comparable generation fluency and diversity against several strong baselines.

ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
Xinpeng Wang | Xiaoyuan Yi | Han Jiang | Shanlin Zhou | Zhihua Wei | Xing Xie
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Recent large-scale Visual-Language Generative Models (VLGMs) have achieved unprecedented improvement in multimodal image/text generation. However, these models might also generate toxic content, e.g., offensive text and pornography images, raising significant ethical risks. Despite exhaustive studies on toxic degeneration of language models, this problem remains largely unexplored within the context of visual-language generation. This work delves into the propensity for toxicity generation and susceptibility to toxic data across various VLGMs. For this purpose, we built ToViLaG, a dataset comprising 32K co-toxic/mono-toxic text-image pairs and 1K innocuous but evocative text that tends to stimulate toxicity. Furthermore, we propose WInToRe, a novel toxicity metric tailored to visual-language generation, which theoretically reflects different aspects of toxicity considering both input and output. On such a basis, we benchmarked the toxicity of a diverse spectrum of VLGMs and discovered that some models do more evil than expected while some are more vulnerable to infection, underscoring the necessity of VLGMs detoxification. Therefore, we develop an innovative bottleneck-based detoxification method. Our method could reduce toxicity while maintaining comparable generation quality, providing a promising initial solution to this line of research.

2022

Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
Wenhao Li | Xiaoyuan Yi | Jinyi Hu | Maosong Sun | Xing Xie
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recently, powerful Transformer architectures have proven superior in generating high-quality sentences. Nevertheless, these models tend to produce dull high-frequency phrases, severely hurting the diversity and novelty of generated text. In this work, we dig into the intrinsic mechanism of this problem and found that sparser attention values in Transformer could improve diversity. To understand such a phenomenon, we first conduct both empirical and theoretical analysis and then attribute it to representation degeneration caused by the attentive mixture of the hidden states during training. We term this process the Trap of Mediocrity. To escape from such a trap, we introduce a novel attention regularization loss to control the sharpness of the attention distribution, which is transparent to model structures and can be easily implemented within 20 lines of python code. We prove that this method could be mathematically regarded as learning a Bayesian approximation of posterior attention. Experiments show that our method improved the diversity and novelty of the generated text while maintaining comparable quality on a variety of conditional and unconditional generation tasks.

Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in Transformer-Based Variational AutoEncoder for Diverse Text Generation
Jinyi Hu | Xiaoyuan Yi | Wenhao Li | Maosong Sun | Xing Xie
Findings of the Association for Computational Linguistics: EMNLP 2022

Variational Auto-Encoder (VAE) has been widely adopted in text generation. Among many variants, recurrent VAE learns token-wise latent variables with each conditioned on the preceding ones, which captures sequential variability better in the era of RNN. However, it is unclear how to incorporate such recurrent dynamics into the recently dominant Transformer due to its parallelism. In this work, we propose TRACE, a Transformer-based recurrent VAE structure. TRACE imposes recurrence on segment-wise latent variables with arbitrarily separated text segments and constructs the posterior distribution with residual parameterization. Besides, we design an acceleration method by approximating idempotent matrices, which allows parallelism while maintaining the conditional dependence of latent variables. We demonstrate that TRACE could deduce a non-zero lower bound of the KL term and enhance the entanglement of each segment and preceding latent variables, providing a theoretical guarantee of generation diversity. Experiments on two unconditional and one conditional generation task show that TRACE achieves significantly improved diversity while maintaining satisfactory generation quality.

Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation
Jinyi Hu | Xiaoyuan Yi | Wenhao Li | Maosong Sun | Xing Xie
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The past several years have witnessed Variational Auto-Encoder’s superiority in various text generation tasks. However, due to the sequential nature of the text, auto-regressive decoders tend to ignore latent variables and then reduce to simple language models, known as the KL vanishing problem, which would further deteriorate when VAE is combined with Transformer-based structures. To ameliorate this problem, we propose Della, a novel variational Transformer framework. Della learns a series of layer-wise latent variables with each inferred from those of lower layers and tightly coupled with the hidden states by low-rank tensor product. In this way, Della forces these posterior latent variables to be fused deeply with the whole computation path and hence incorporate more information. We theoretically demonstrate that our method can be regarded as entangling latent variables to avoid posterior information decrease through layers, enabling Della to get higher non-zero KL values even without any annealing or thresholding tricks. Experiments on four unconditional and three conditional generation tasks show that Della could better alleviate KL vanishing and improve both quality and diversity compared to several strong baselines.

2021

Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction
Zhenghao Liu | Xiaoyuan Yi | Maosong Sun | Liner Yang | Tat-Seng Chua
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Grammatical Error Correction (GEC) aims to correct writing errors and help language learners improve their writing skills. However, existing GEC models tend to produce spurious corrections or fail to detect lots of errors. The quality estimation model is necessary to ensure learners get accurate GEC results and avoid misleading from poorly corrected sentences. Well-trained GEC models can generate several high-quality hypotheses through decoding, such as beam search, which provide valuable GEC evidence and can be used to evaluate GEC quality. However, existing models neglect the possible GEC evidence from different hypotheses. This paper presents the Neural Verification Network (VERNet) for GEC quality estimation with multiple hypotheses. VERNet establishes interactions among hypotheses with a reasoning graph and conducts two kinds of attention mechanisms to propagate GEC evidence to verify the quality of generated hypotheses. Our experiments on four GEC datasets show that VERNet achieves state-of-the-art grammatical error detection performance, achieves the best quality estimation results, and significantly improves GEC performance by reranking hypotheses. All data and source codes are available at https://github.com/thunlp/VERNet.

2019

Jiuge: A Human-Machine Collaborative Chinese Classical Poetry Generation System
Zhipeng Guo | Xiaoyuan Yi | Maosong Sun | Wenhao Li | Cheng Yang | Jiannan Liang | Huimin Chen | Yuhui Zhang | Ruoyu Li
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Research on the automatic generation of poetry, the treasure of human culture, has lasted for decades. Most existing systems, however, are merely model-oriented, which input some user-specified keywords and directly complete the generation process in one pass, with little user participation. We believe that the machine, being a collaborator or an assistant, should not replace human beings in poetic creation. Therefore, we proposed Jiuge, a human-machine collaborative Chinese classical poetry generation system. Unlike previous systems, Jiuge allows users to revise the unsatisfied parts of a generated poem draft repeatedly. According to the revision, the poem will be dynamically updated and regenerated. After the revision and modification procedure, the user can write a satisfying poem together with Jiuge system collaboratively. Besides, Jiuge can accept multi-modal inputs, such as keywords, plain text or images. By exposing the options of poetry genres, styles and revision modes, Jiuge, acting as a professional assistant, allows constant and active participation of users in poetic creation.

2018

Automatic Poetry Generation with Mutual Reinforcement Learning
Xiaoyuan Yi | Maosong Sun | Ruoyu Li | Wenhao Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Poetry is one of the most beautiful forms of human language art. As a crucial step towards computer creativity, automatic poetry generation has drawn researchers’ attention for decades. In recent years, some neural models have made remarkable progress in this task. However, they are all based on maximum likelihood estimation, which only learns common patterns of the corpus and results in loss-evaluation mismatch. Human experts evaluate poetry in terms of some specific criteria, instead of word-level likelihood. To handle this problem, we directly model the criteria and use them as explicit rewards to guide gradient update by reinforcement learning, so as to motivate the model to pursue higher scores. Besides, inspired by writing theories, we propose a novel mutual reinforcement learning schema. We simultaneously train two learners (generators) which learn not only from the teacher (rewarder) but also from each other to further improve performance. We experiment on Chinese poetry. Based on a strong basic model, our method achieves better results and outperforms the current state-of-the-art method.

Stylistic Chinese Poetry Generation via Unsupervised Style Disentanglement
Cheng Yang | Maosong Sun | Xiaoyuan Yi | Wenhao Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The ability to write diverse poems in different styles under the same poetic imagery is an important characteristic of human poetry writing. Most previous works on automatic Chinese poetry generation focused on improving the coherency among lines. Some work explored style transfer but suffered from expensive expert labeling of poem styles. In this paper, we target on stylistic poetry generation in a fully unsupervised manner for the first time. We propose a novel model which requires no supervised style labeling by incorporating mutual information, a concept in information theory, into modeling. Experimental results show that our model is able to generate stylistic poems without losing fluency and coherency.

Chinese Poetry Generation with a Salient-Clue Mechanism
Xiaoyuan Yi | Ruoyu Li | Maosong Sun
Proceedings of the 22nd Conference on Computational Natural Language Learning

As a precious part of the human cultural heritage, Chinese poetry has influenced people for generations. Automatic poetry composition is a challenge for AI. In recent years, significant progress has been made in this area benefiting from the development of neural networks. However, the coherence in meaning, theme or even artistic conception for a generated poem as a whole still remains a big problem. In this paper, we propose a novel Salient-Clue mechanism for Chinese poetry generation. Different from previous work which tried to exploit all the context information, our model selects the most salient characters automatically from each so-far generated line to gradually form a salient clue, which is utilized to guide successive poem generation process so as to eliminate interruptions and improve coherence. Besides, our model can be flexibly extended to control the generated poem in different aspects, for example, poetry style, which further enhances the coherence. Experimental results show that our model is very effective, outperforming three strong baselines.

Co-authors

Tat-Seng Chua 1

Zhicheng Dou (窦志成) 1

Laks Lakshmanan, V.S. 1

Jiannan Liang 1

Zhenghao Liu (刘正皓) 1

Aishan Maoliniyazi 1

Xiaofeng Meng 1

Tuan Dung Nguyen 1

Frederick L. Oswald 1

Venues