Qiong Wu
Other people with similar names: Qiong Wu
Unverified author pages with similar names: Qiong Wu
2026
Curriculum Learning based Hierarchical Scoring and Analysis Framework for Question Answering Task Evaluation
Qiong Wu | Tan Yue | Jianxin Liang | Zhen Li | Kai He | Shuai Zhao | Dongyan Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Qiong Wu | Tan Yue | Jianxin Liang | Zhen Li | Kai He | Shuai Zhao | Dongyan Zhao
Findings of the Association for Computational Linguistics: ACL 2026
The rapid progress of large language models (LLMs) has increased the demand for efficient and reliable evaluation of question answering (QA) systems. Existing evaluation methods either rely on rule-based matching with shallow semantic understanding or adopt LLM-as-a-Judge approaches that incur high cost and latency while offering limited error interpretability. Accordingly, we propose HiEval, a curriculum learning based hierarchical framework for QA task evaluation that supports both quick scoring and fine-grained error analysis. HiEval contains a quick scoring model (HiEval-QS) that predicts three-level correctness labels, and an error analysis model (HiEval-EA) that identifies incorrect responses into five error types. HiEval incorporates a class-balanced focal loss to handle label imbalance, experience replay to prevent forgetting, and contrastive unlikelihood optimization to improve error discrimination. We also construct two large-scale human-annotated evaluation datasets collected from 50 QA-related datasets, covering 8 task types and release two challenging benchmarks. Extensive experiments show that HiEval achieves state-of-the-art performance on both quick scoring and error analysis tasks, outperforming all baseline methods, including GPT-5, while being approximately 25× faster.