CoST of breaking the LLMs

Ananya Mukherjee, Saumitra Yadav, Manish Shrivastava


Abstract
This paper presents an evaluation of 16 machine translation systems submitted to the Shared Task of the 9th Conference of Machine Translation (WMT24) for the English-Hindi (en-hi) language pair using our Complex Structures Test (CoST) suite. Aligning with this year’s test suite sub-task theme, “Help us break LLMs”, we curated a comprehensive test suite encompassing diverse datasets across various categories, including autobiography, poetry, legal, conversation, play, narration, technical, and mixed genres. Our evaluation reveals that all the systems struggle significantly with the archaic style of text like legal and technical writings or text with creative twist like conversation and poetry datasets, highlighting their weaknesses in handling complex linguistic structures and stylistic nuances inherent in these text types. Our evaluation identifies the strengths and limitations of the submitted models, pointing to specific areas where further research and development are needed to enhance their performance. Our test suite is available at https://github.com/AnanyaCoder/CoST-WMT-24-Test-Suite-Task.
Anthology ID:
2024.wmt-1.24
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
299–306
Language:
URL:
https://aclanthology.org/2024.wmt-1.24
DOI:
Bibkey:
Cite (ACL):
Ananya Mukherjee, Saumitra Yadav, and Manish Shrivastava. 2024. CoST of breaking the LLMs. In Proceedings of the Ninth Conference on Machine Translation, pages 299–306, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
CoST of breaking the LLMs (Mukherjee et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.24.pdf