Minh Duc Chu


2025

pdf bib
Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities
Minh Duc Chu | Zihao He | Rebecca Dorn | Kristina Lerman
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large language models (LLMs) have shown promise in representing individuals and communities, offering new ways to study complex social dynamics. However, effectively aligning LLMs with specific human groups and systematically assessing the fidelity of the alignment remains a challenge. This paper presents a robust framework for aligning LLMs with online communities via instruction-tuning and comprehensively evaluating alignment across various aspects of language, including authenticity, emotional tone, toxicity, and harm. We demonstrate the utility of our approach by applying it to online communities centered on dieting and body image. We administer an eating disorder psychometric test to the aligned LLMs to reveal unhealthy beliefs and successfully differentiate communities with varying levels of eating disorder risk. Our results highlight the potential of LLMs in automated moderation and broader applications in public health and social science research.

2024

pdf bib
Community-Cross-Instruct: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities
Zihao He | Minh Duc Chu | Rebecca Dorn | Siyi Guo | Kristina Lerman
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Social scientists use surveys to probe the opinions and beliefs of populations, but these methods are slow, costly, and prone to biases. Recent advances in large language models (LLMs) enable the creating of computational representations or “digital twins” of populations that generate human-like responses mimicking the population’s language, styles, and attitudes. We introduce Community-Cross-Instruct, an unsupervised framework for aligning LLMs to online communities to elicit their beliefs. Given a corpus of a community’s online discussions, Community-Cross-Instruct automatically generates instruction-output pairs by an advanced LLM to (1) finetune a foundational LLM to faithfully represent that community, and (2) evaluate the alignment of the finetuned model to the community. We demonstrate the method’s utility in accurately representing political and diet communities on Reddit. Unlike prior methods requiring human-authored instructions, Community-Cross-Instruct generates instructions in a fully unsupervised manner, enhancing scalability and generalization across domains. This work enables cost-effective and automated surveying of diverse online communities.