FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback

SeongYeub Chu; Jongwoo Kim; Mun Yong Yi

FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback

Abstract

Going beyond the prediction of numerical scores, recent research in automated essay scoring has increasingly emphasized the generation of high-quality feedback that provides justification and actionable guidance. To mitigate the high cost of expert annotation, prior work has commonly relied on LLM-generated feedback to train essay assessment models. However, such feedback is often incorporated without explicit quality validation, resulting in the propagation of noise in downstream applications. To address this limitation, we propose FeedEval, an LLM-based framework for evaluating LLM-generated essay feedback along three pedagogically grounded dimensions: specificity, helpfulness, and validity. FeedEval employs dimension-specialized LLM evaluators trained on datasets curated in this study to assess multiple feedback candidates and select high-quality feedback for downstream use. Experiments on the ASAP++ benchmark show that FeedEval closely aligns with human expert judgments and that essay scoring models trained with FeedEval-filtered high-quality feedback achieve superior scoring performance. Furthermore, revision experiments using small LLMs show that the high-quality feedback identified by FeedEval leads to more effective essay revisions. We release our code and curated datasets at: https://github.com/BBeeChu/FeedEval.git.

Anthology ID:: 2026.findings-acl.615
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12648–12674
Language:
URL:: https://aclanthology.org/2026.findings-acl.615/
DOI:
Bibkey:
Cite (ACL):: SeongYeub Chu, Jongwoo Kim, and Mun Yong Yi. 2026. FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback. In Findings of the Association for Computational Linguistics: ACL 2026, pages 12648–12674, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback (Chu et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.615.pdf
Checklist:: 2026.findings-acl.615.checklist.pdf

PDF Cite Search Checklist Fix data