HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

Yilin Jiang; Fei Tan; Xuanyu Yin; Leng Jing; Aimin Zhou

HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

Yilin Jiang, Fei Tan, Xuanyu Yin, Leng Jing, Aimin Zhou

Abstract

Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation instantiates personas as student agents answering CEPS and PISA 2022 surveys; across 16 cohorts, math and curiosity/growth constructs align strongly between humans and agents, whereas classroom-climate and well-being constructs are only moderately aligned, revealing a fidelity gradient. All personas are generated with Qwen2.5-72B, and HACHIMI provides a standardized synthetic student population for group-level benchmarking and social-science simulations. Resources available at https://github.com/ZeroLoss-Lab/HACHIMI.

Anthology ID:: 2026.findings-acl.1080
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21461–21506
Language:
URL:: https://aclanthology.org/2026.findings-acl.1080/
DOI:
Bibkey:
Cite (ACL):: Yilin Jiang, Fei Tan, Xuanyu Yin, Leng Jing, and Aimin Zhou. 2026. HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents. In Findings of the Association for Computational Linguistics: ACL 2026, pages 21461–21506, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents (Jiang et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1080.pdf
Checklist:: 2026.findings-acl.1080.checklist.pdf

PDF Cite Search Checklist Fix data