Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn’t Help with MT Evaluation

Petra Barancikova; Ondřej Bojar

Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn’t Help with MT Evaluation

Abstract

In this paper, we compare Czech-specific and multilingual sentence embedding models through intrinsic and extrinsic evaluation paradigms. For intrinsic evaluation, we employ Costra, a complex sentence transformation dataset, and several Semantic Textual Similarity (STS) benchmarks to assess the ability of the embeddings to capture linguistic phenomena such as semantic similarity, temporal aspects, and stylistic variations. In the extrinsic evaluation, we fine-tune each embedding model using COMET-based metrics for machine translation evaluation. Our experiments reveal an interesting disconnect: models that excel in intrinsic semantic similarity tests do not consistently yield superior performance on downstream translation evaluation tasks. Conversely, models with seemingly over-smoothed embedding spaces can, through fine-tuning, achieve excellent results. These findings highlight the complex relationship between semantic property probes and downstream task, emphasizing the need for more research into “operationalizable semantics” in sentence embeddings, or more in-depth downstream tasks datasets (here translation evaluation).

Anthology ID:: 2025.mtsummit-1.20
Volume:: Proceedings of Machine Translation Summit XX: Volume 1
Month:: June
Year:: 2025
Address:: Geneva, Switzerland
Editors:: Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Moniz, Sara Szoc
Venue:: MTSummit
SIG:
Publisher:: European Association for Machine Translation
Note:
Pages:: 265–275
Language:
URL:: https://aclanthology.org/2025.mtsummit-1.20/
DOI:
Bibkey:
Cite (ACL):: Petra Barančíková and Ondřej Bojar. 2025. Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn’t Help with MT Evaluation. In Proceedings of Machine Translation Summit XX: Volume 1, pages 265–275, Geneva, Switzerland. European Association for Machine Translation.
Cite (Informal):: Intrinsic vs. Extrinsic Evaluation of Czech Sentence Embeddings: Semantic Relevance Doesn’t Help with MT Evaluation (Barančíková & Bojar, MTSummit 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.mtsummit-1.20.pdf

PDF Cite Search Fix data