Pachara Boonsarngsuk

2026

Machine translation Evaluation Eng-Thai MQM Ranking dataset
Phichet Phuangrot | Natdanai Trintawat | Kanawat Vilasri | Yanapat Patcharawiwatpong | Pachara Boonsarngsuk | Nat Pavasant | Ekapol Chuangsuwanich
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

We introduce MEET-MR (Machine Translation English–Thai MQM and Ranking Dataset), a comprehensive benchmark for evaluating English–Thai machine translation systems. The dataset is constructed using the Multidimensional Quality Metrics (MQM) annotation framework, providing fine-grained human judgements of translation quality. In addition, MEET-MR includes human preference rankings and reference translations, enabling both absolute and relative assessment of translation quality. The dataset covers nine diverse domains providing linguistic and contextual diversity. By combining high-quality reference translations, objective MQM error annotations, and subjective preference rankings, MEET-MR serves as a valuable resource for studying translation quality estimation, model alignment with human evaluation, and cross-domain performance in English–Thai machine translation. MEET-MR is publicly available at https://huggingface.co/datasets/Chula-AI/MEET-MR

2025

pdf bib abs

Evaluating Sampling Strategies for Similarity-Based Short Answer Scoring: a Case Study in Thailand
Pachara Boonsarngsuk | Pacharapon Arpanantikul | Supakorn Hiranwipas | Wipu Watcharakajorn | Ekapol Chuangsuwanich
Proceedings of the Second Workshop in South East Asian Language Processing

Automatic short answer scoring is a task whose aim is to help grade written works by learners of some subject matter. In niche subject domains with small examples, existing methods primarily utilized similarity-based scoring, relying on predefined reference answers to grade each student’s answer based on the similarity to the reference. However, these reference answers are often generated from a randomly selected set of graded student answer, which may fail to represent the full range of scoring variations. We propose a semi-automatic scoring framework that enhances the selective sampling strategy for defining the reference answers through a K-center-based and a K-means-based sampling method. Our results demonstrate that our framework outperforms previous similarity-based scoring methods on a dataset with Thai and English. Moreover, it achieves competitive performance compared to human reference performance and LLMs.

Co-authors

Phichet Phuangrot 1

Natdanai Trintawat 1

Kanawat Vilasri 1

Wipu Watcharakajorn 1

Venues

Fix author