Supakorn Hiranwipas


2025

pdf bib
Evaluating Sampling Strategies for Similarity-Based Short Answer Scoring: a Case Study in Thailand
Pachara Boonsarngsuk | Pacharapon Arpanantikul | Supakorn Hiranwipas | Wipu Watcharakajorn | Ekapol Chuangsuwanich
Proceedings of the Second Workshop in South East Asian Language Processing

Automatic short answer scoring is a task whose aim is to help grade written works by learners of some subject matter. In niche subject domains with small examples, existing methods primarily utilized similarity-based scoring, relying on predefined reference answers to grade each student’s answer based on the similarity to the reference. However, these reference answers are often generated from a randomly selected set of graded student answer, which may fail to represent the full range of scoring variations. We propose a semi-automatic scoring framework that enhances the selective sampling strategy for defining the reference answers through a K-center-based and a K-means-based sampling method. Our results demonstrate that our framework outperforms previous similarity-based scoring methods on a dataset with Thai and English. Moreover, it achieves competitive performance compared to human reference performance and LLMs.