Ahsan Arafat


2023

pdf bib
Towards Large Language Model driven Reference-less Translation Evaluation for English and Indian Language
Mujadia Vandan | Mishra Pruthwik | Ahsan Arafat | M. Sharma Dipti
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

With the primary focus on evaluating the effectiveness of large language models for automatic reference-less translation assessment, this work presents our experiments on mimicking human direct assessment to evaluate the quality of translations in English and Indian languages. We constructed a translation evaluation task where we performed zero-shot learning, in-context example-driven learning, and fine-tuning of large language models to provide a score out of 100, where 100 represents a perfect translation and 1 represents a poor translation. We compared the performance of our trained systems with existing methods such as COMET, BERT-Scorer, and LABSE, and found that the LLM-based evaluator (LLaMA2-13B) achieves a comparable or higher overall correlation with human judgments for the considered Indian language pairs (Refer figure 1).