Zhang Han
2025
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models
Jiamin Su | Yibo Yan | Fangteng Fu | Zhang Han | Jingheng Ye | Xiang Liu | Jiahao Huo | Huiyu Zhou | Xuming Hu
Findings of the Association for Computational Linguistics: ACL 2025
Jiamin Su | Yibo Yan | Fangteng Fu | Zhang Han | Jingheng Ye | Xiang Liu | Jiahao Huo | Huiyu Zhou | Xuming Hu
Findings of the Association for Computational Linguistics: ACL 2025
Automated Essay Scoring (AES) plays a crucial role in educational assessment by providing scalable and consistent evaluations of writing tasks. However, traditional AES systems face three major challenges: (i) reliance on handcrafted features that limit generalizability, (ii) difficulty in capturing fine-grained traits like coherence and argumentation, and (iii) inability to handle multimodal contexts. In the era of Multimodal Large Language Models (MLLMs), we propose **EssayJudge**, the **first multimodal benchmark to evaluate AES capabilities across lexical-, sentence-, and discourse-level traits**. By leveraging MLLMs’ strengths in trait-specific scoring and multimodal context understanding, EssayJudge aims to offer precise, context-rich evaluations without manual feature engineering, addressing longstanding AES limitations. Our experiments with 18 representative MLLMs reveal gaps in AES performance compared to human evaluation, particularly in discourse-level traits, highlighting the need for further advancements in MLLM-based AES research. Our dataset and code will be available upon acceptance.
2021
Few-Shot Charge Prediction with Multi-Grained Features and MutualInformation
Zhang Han | Zhu Yutao | Dou Zhicheng | Wen Ji-Rong
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Zhang Han | Zhu Yutao | Dou Zhicheng | Wen Ji-Rong
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Charge prediction aims to predict the final charge for a case according to its fact descriptionand plays an important role in legal assistance systems. With deep learning based methods prediction on high-frequency charges has achieved promising results but that on few-shot chargesis still challenging. In this work we propose a framework with multi-grained features and mutual information for few-shot charge prediction. Specifically we extract coarse- and fine-grained features to enhance the model’s capability on representation based on which the few-shot chargescan be better distinguished. Furthermore we propose a loss function based on mutual information. This loss function leverages the prior distribution of the charges to tune their weights so the few-shot charges can contribute more on model optimization. Experimental results on several datasets demonstrate the effectiveness and robustness of our method. Besides our method can work wellon tiny datasets and has better efficiency in the training which provides better applicability in realscenarios.