David Anugraha


2024

pdf bib
Predicting Machine Translation Performance on Low-Resource Languages: The Role of Domain Similarity
Eric Khiu | Hasti Toossi | David Anugraha | Jinyu Liu | Jiaxu Li | Juan Flores | Leandro Roman | A. Seza Doğruöz | En-Shiun Lee
Findings of the Association for Computational Linguistics: EACL 2024

Fine-tuning and testing a multilingual large language model is a challenge for low-resource languages (LRLs) since it is an expensive process. While previous studies have predicted the performance of natural language processing (NLP) tasks using machine learning methods, they primarily focus on high-resource languages, overlooking LRLs and shifts across domains. Focusing on LRLs, we investigate three factors (the size of the fine-tuning corpus, domain similarity between fine-tuning and testing corpora, and language similarity between source and target languages), which can potentially impact the model performance by using classical regression models. Our results indicate that domain similarity has the most important impact on predicting the performance of Machine Translation models.

pdf bib
MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibration
David Anugraha | Garry Kuwanto | Lucky Susanto | Derry Tanti Wijaya | Genta Winata
Proceedings of the Ninth Conference on Machine Translation

We present MetaMetrics-MT, an innovative metric designed to evaluate machine translation (MT) tasks by aligning closely with human preferences through Bayesian optimization with Gaussian Processes. MetaMetrics-MT enhances existing MT metrics by optimizing their correlation with human judgments. Our experiments on the WMT24 metric shared task dataset demonstrate that MetaMetrics-MT outperforms all existing baselines, setting a new benchmark for state-of-the-art performance in the reference-based setting. Furthermore, it achieves comparable results to leading metrics in the reference-free setting, offering greater efficiency.