chrF-S: Semantics Is All You Need

Ananya Mukherjee, Manish Shrivastava


Abstract
Machine translation (MT) evaluation metrics like BLEU and chrF++ are widely used reference-based metrics that do not require training and are language-independent. However, these metrics primarily focus on n-gram matching and often overlook semantic depth and contextual understanding. To address this gap, we introduce chrF-S (Semantic chrF++), an enhanced metric that integrates sentence embeddings to evaluate translation quality more comprehensively. By combining traditional character and word n-gram analysis with semantic information derived from embeddings, chrF-S captures both syntactic accuracy and sentence-level semantics. This paper presents our contributions to the WMT24 shared metrics task, showcasing our participation and the development of chrF-S. We also demonstrate that, according to preliminary results on the leaderboard, our metric performs on par with other supervised and LLM-based metrics. By merging semantic insights with n-gram precision, chrF-S offers a significant enhancement in the assessment of machine-generated translations, advancing the field of MT evaluation. Our code and data will be made available at https://github.com/AnanyaCoder/chrF-S.
Anthology ID:
2024.wmt-1.33
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
470–474
Language:
URL:
https://aclanthology.org/2024.wmt-1.33
DOI:
Bibkey:
Cite (ACL):
Ananya Mukherjee and Manish Shrivastava. 2024. chrF-S: Semantics Is All You Need. In Proceedings of the Ninth Conference on Machine Translation, pages 470–474, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
chrF-S: Semantics Is All You Need (Mukherjee & Shrivastava, WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.33.pdf