A High-quality Seed Dataset for Italian Machine Translation

Edoardo Ferrante


Abstract
This paper describes the submission of a high-quality translation of the OLDI Seed datasetinto Italian for the WMT 2023 Open LanguageData Initiative shared task.The base of this submission is a previous ver-sion of an Italian OLDI Seed dataset releasedby Haberland et al. (2024) via machine trans-lation and partial post-editing. This data wassubsequently reviewed in its entirety by twonative speakers of Italian, who carried out ex-tensive post-editing with particular attention tothe idiomatic translation of named entities.
Anthology ID:
2024.wmt-1.43
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
567–569
Language:
URL:
https://aclanthology.org/2024.wmt-1.43
DOI:
Bibkey:
Cite (ACL):
Edoardo Ferrante. 2024. A High-quality Seed Dataset for Italian Machine Translation. In Proceedings of the Ninth Conference on Machine Translation, pages 567–569, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
A High-quality Seed Dataset for Italian Machine Translation (Ferrante, WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.43.pdf