Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

Cheng-Han Chiang, Wei-Chih Chen, Chun-Yi Kuan, Chienchou Yang, Hung-yi Lee


Abstract
Using large language models (LLMs) for automatic evaluation has become an important evaluation method in NLP research. However, it is unclear whether these LLM-based evaluators can be effectively applied in real-world classrooms to assess student assignments. This empirical report shares how we use GPT-4 as an automatic assignment evaluator in a university course with over 1000 students. Based on student responses, we found that LLM-based assignment evaluators are generally acceptable to students when they have free access to these tools. However, students also noted that the LLM sometimes fails to adhere to the evaluation instructions, resulting in unreasonable assessments. Additionally, we observed that students can easily manipulate the LLM to output specific strings, allowing them to achieve high scores without meeting the assignment rubric. Based on student feedback and our experience, we offer several recommendations for effectively integrating LLMs into future classroom evaluations. Our observation also highlights potential directions for improving LLM-based evaluators, including their instruction-following ability and vulnerability to prompt hacking.
Anthology ID:
2024.emnlp-main.146
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2489–2513
Language:
URL:
https://aclanthology.org/2024.emnlp-main.146
DOI:
10.18653/v1/2024.emnlp-main.146
Bibkey:
Cite (ACL):
Cheng-Han Chiang, Wei-Chih Chen, Chun-Yi Kuan, Chienchou Yang, and Hung-yi Lee. 2024. Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 2489–2513, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course (Chiang et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.146.pdf