SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs

Sihang Zhao; Kangrui Yu; Youliang Yuan; Pinjia He; Hongyi Wen

SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs

Sihang Zhao, Kangrui Yu, Youliang Yuan, Pinjia He, Hongyi Wen

Abstract

Large Language Models (LLMs) have been widely explored in educational scenarios. We identify a critical vulnerability in current educational LLMs, pedagogical jailbreaks, where students use answer-inducing prompts to elicit solutions rather than scaffolded instructions. To enable systematic study, we unify and formalize safe, helpful, and pedagogical behaviors with a knowledge-mastery graph and introduce SHAPE, a benchmark of 9,087 student-question pairs for evaluating tutoring behavior under adversarial pressure. We propose a graph-augmented tutoring pipeline that infers prerequisite concepts from queries, identifies mastery gaps, and routes generation between instructing and problem-solving via explicit gating. Experiments across multiple LLMs show that our method yields significantly improved safety under two pedagogical jailbreak settings, while maintaining near-ceiling helpfulness under the same evaluation protocol. Our code and data are available at https://github.com/MAPS-research/SHaPE

Anthology ID:: 2026.acl-long.529
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11537–11553
Language:
URL:: https://aclanthology.org/2026.acl-long.529/
DOI:
Bibkey:
Cite (ACL):: Sihang Zhao, Kangrui Yu, Youliang Yuan, Pinjia He, and Hongyi Wen. 2026. SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11537–11553, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs (Zhao et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.529.pdf
Checklist:: 2026.acl-long.529.checklist.pdf

PDF Cite Search Checklist Fix data