Domain Dynamics: Evaluating Large Language Models in English-Hindi Translation

Soham Bhattacharjee; Baban Gain; Asif Ekbal

doi:10.18653/v1/2024.wmt-1.27

Domain Dynamics: Evaluating Large Language Models in English-Hindi Translation

Soham Bhattacharjee, Baban Gain, Asif Ekbal

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in machine translation, leveraging extensive pre-training on vast amounts of data. However, this generalist training often overlooks domain-specific nuances, leading to potential difficulties when translating specialized texts. In this study, we present a multi-domain test suite, collated from previously published datasets, designed to challenge and evaluate the translation abilities of LLMs. The test suite encompasses diverse domains such as judicial, education, literature (specifically religious texts), and noisy user-generated content from online product reviews and forums like Reddit. Each domain consists of approximately 250-300 sentences, carefully curated and randomized in the final compilation. This English-to-Hindi dataset aims to evaluate and expose the limitations of LLM-based translation systems, offering valuable insights into areas requiring further research and development. We have submitted the dataset to WMT24 Break the LLM subtask. In this paper, we present our findings. We have made the code and the dataset publicly available at https://github.com/sohamb37/wmt24-test-suite

Anthology ID:: 2024.wmt-1.27
Volume:: Proceedings of the Ninth Conference on Machine Translation
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venues:: WMT | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 341–354
Language:
URL:: https://aclanthology.org/2024.wmt-1.27/
DOI:: 10.18653/v1/2024.wmt-1.27
Bibkey:
Cite (ACL):: Soham Bhattacharjee, Baban Gain, and Asif Ekbal. 2024. Domain Dynamics: Evaluating Large Language Models in English-Hindi Translation. In Proceedings of the Ninth Conference on Machine Translation, pages 341–354, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Domain Dynamics: Evaluating Large Language Models in English-Hindi Translation (Bhattacharjee et al., WMT 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.wmt-1.27.pdf

PDF Cite Search Fix data