Domain Dynamics: Evaluating Large Language Models in English-Hindi Translation

Soham Bhattacharjee; Baban Gain; Asif Ekbal

Domain Dynamics: Evaluating Large Language Models in English-Hindi Translation

Soham Bhattacharjee, Baban Gain, Asif Ekbal

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in machine translation, leveraging extensive pre-training on vast amounts of data. However, this generalist training often overlooks domain-specific nuances, leading to potential difficulties when translating specialized texts. In this study, we present a multi-domain test suite, collated from previously published datasets, designed to challenge and evaluate the translation abilities of LLMs. The test suite encompasses diverse domains such as judicial, education, literature (specifically religious texts), and noisy user-generated content from online product reviews and forums like Reddit. Each domain consists of approximately 250-300 sentences, carefully curated and randomized in the final compilation. This English-to-Hindi dataset aims to evaluate and expose the limitations of LLM-based translation systems, offering valuable insights into areas requiring further research and development. We have submitted the dataset to WMT24 Break the LLM

Anthology ID:: 2024.icon-1.19
Volume:: Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Month:: December
Year:: 2024
Address:: AU-KBC Research Centre, Chennai, India
Editors:: Sobha Lalitha Devi, Karunesh Arora
Venue:: ICON
SIG:
Publisher:: NLP Association of India (NLPAI)
Note:
Pages:: 169–177
Language:
URL:: https://aclanthology.org/2024.icon-1.19/
DOI:
Bibkey:
Cite (ACL):: Soham Bhattacharjee, Baban Gain, and Asif Ekbal. 2024. Domain Dynamics: Evaluating Large Language Models in English-Hindi Translation. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 169–177, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
Cite (Informal):: Domain Dynamics: Evaluating Large Language Models in English-Hindi Translation (Bhattacharjee et al., ICON 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.icon-1.19.pdf

PDF Cite Search Fix data