Soham Bhattacharjee

2024

Domain Dynamics: Evaluating Large Language Models in English-Hindi Translation
Soham Bhattacharjee | Baban Gain | Asif Ekbal
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

Large Language Models (LLMs) have demonstrated impressive capabilities in machine translation, leveraging extensive pre-training on vast amounts of data. However, this generalist training often overlooks domain-specific nuances, leading to potential difficulties when translating specialized texts. In this study, we present a multi-domain test suite, collated from previously published datasets, designed to challenge and evaluate the translation abilities of LLMs. The test suite encompasses diverse domains such as judicial, education, literature (specifically religious texts), and noisy user-generated content from online product reviews and forums like Reddit. Each domain consists of approximately 250-300 sentences, carefully curated and randomized in the final compilation. This English-to-Hindi dataset aims to evaluate and expose the limitations of LLM-based translation systems, offering valuable insights into areas requiring further research and development. We have submitted the dataset to WMT24 Break the LLM

pdf bib abs

Domain Dynamics: Evaluating Large Language Models in English-Hindi Translation
Soham Bhattacharjee | Baban Gain | Asif Ekbal
Proceedings of the Ninth Conference on Machine Translation

Co-authors

Asif Ekbal 2
Baban Gain 2

Venues

Fix author