Juan Pablo Martínez
2024
Expanding the FLORES+ Multilingual Benchmark with Translations for Aragonese, Aranese, Asturian, and Valencian
Juan Antonio Perez-Ortiz
|
Felipe Sánchez-Martínez
|
Víctor M. Sánchez-Cartagena
|
Miquel Esplà-Gomis
|
Aaron Galiano Jimenez
|
Antoni Oliver
|
Claudi Aventín-Boya
|
Alejandro Pardos
|
Cristina Valdés
|
Jusèp Loís Sans Socasau
|
Juan Pablo Martínez
Proceedings of the Ninth Conference on Machine Translation
In this paper, we describe the process of creating the FLORES+ datasets for several Romance languages spoken in Spain, namely Aragonese, Aranese, Asturian, and Valencian. The Aragonese and Aranese datasets are entirely new additions to the FLORES+ multilingual benchmark. An initial version of the Asturian dataset was already available in FLORES+, and our work focused on a thorough revision. Similarly, FLORES+ included a Catalan dataset, which we adapted to the Valencian variety spoken in the Valencian Community. The development of the Aragonese, Aranese, and revised Asturian FLORES+ datasets was undertaken as part of a WMT24 shared task on translation into low-resource languages of Spain.