Garry Kuwanto


2024

pdf bib
MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibration
David Anugraha | Garry Kuwanto | Lucky Susanto | Derry Tanti Wijaya | Genta Winata
Proceedings of the Ninth Conference on Machine Translation

We present MetaMetrics-MT, an innovative metric designed to evaluate machine translation (MT) tasks by aligning closely with human preferences through Bayesian optimization with Gaussian Processes. MetaMetrics-MT enhances existing MT metrics by optimizing their correlation with human judgments. Our experiments on the WMT24 metric shared task dataset demonstrate that MetaMetrics-MT outperforms all existing baselines, setting a new benchmark for state-of-the-art performance in the reference-based setting. Furthermore, it achieves comparable results to leading metrics in the reference-free setting, offering greater efficiency.

pdf bib
Mitigating Translationese in Low-resource Languages: The Storyboard Approach
Garry Kuwanto | Eno-Abasi E. Urua | Priscilla Amondi Amuok | Shamsuddeen Hassan Muhammad | Anuoluwapo Aremu | Verrah Otiende | Loice Emma Nanyanga | Teresiah W. Nyoike | Aniefon D. Akpan | Nsima Ab Udouboh | Idongesit Udeme Archibong | Idara Effiong Moses | Ifeoluwatayo A. Ige | Benjamin Ajibade | Olumide Benjamin Awokoya | Idris Abdulmumin | Saminu Mohammad Aliyu | Ruqayya Nasir Iro | Ibrahim Said Ahmad | Deontae Smith | Praise-EL Michaels | David Ifeoluwa Adelani | Derry Tanti Wijaya | Anietie Andy
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Low-resource languages often face challenges in acquiring high-quality language data due to the reliance on translation-based methods, which can introduce the translationese effect. This phenomenon results in translated sentences that lack fluency and naturalness in the target language. In this paper, we propose a novel approach for data collection by leveraging storyboards to elicit more fluent and natural sentences. Our method involves presenting native speakers with visual stimuli in the form of storyboards and collecting their descriptions without direct exposure to the source text. We conducted a comprehensive evaluation comparing our storyboard-based approach with traditional text translation-based methods in terms of accuracy and fluency. Human annotators and quantitative metrics were used to assess translation quality. The results indicate a preference for text translation in terms of accuracy, while our method demonstrates worse accuracy but better fluency in the language focused.

2023

pdf bib
DUnE: Dataset for Unified Editing
Afra Akyürek | Eric Pan | Garry Kuwanto | Derry Wijaya
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Even the most advanced language models remain susceptible to errors necessitating to modify these models without initiating a comprehensive retraining process. Model editing refers to the modification of a model’s knowledge or representations in a manner that produces the desired outcomes. Prior research primarily centered around editing factual data e.g. “Messi plays for Inter Miami” confining the definition of an edit to a knowledge triplet i.e. (subject, object, relation). However, as the applications of language models expand, so do the diverse ways in which we wish to edit and refine their outputs. In this study, we broaden the scope of the editing problem to include an array of editing cases such as debiasing and rectifying reasoning errors and define an edit as any natural language expression that solicits a change in the model’s outputs. We are introducing DUnE, an editing benchmark where edits are natural language sentences and propose that DUnE presents a challenging yet relevant task. To substantiate this claim, we conduct an extensive series of experiments testing various editing approaches to address DUnE, demonstrating their respective strengths and weaknesses. We argue that retrieval-augmented language modeling can outperform specialized editing techniques and neither set of approaches has fully solved the generalized editing problem covered by our benchmark.