Cristina Valdés


2023

pdf bib
TAN-IBE: Neural Machine Translation for the romance languages of the Iberian Peninsula
Antoni Oliver | Mercè Vàzquez | Marta Coll-Florit | Sergi Álvarez | Víctor Suárez | Claudi Aventín-Boya | Cristina Valdés | Mar Font | Alejandro Pardos
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

The main goal of this project is to explore the techniques for training NMT systems applied to Spanish, Portuguese, Catalan, Galician, Asturian, Aragonese and Aranese. These languages belong to the same Romance family, but they are very different in terms of the linguistic resources available. Asturian, Aragonese and Aranese can be considered low resource languages. These characteristics make this setting an excellent place to explore training techniques for low-resource languages: transfer learning and multilingual systems, among others. The first months of the project have been dedicated to the compilation of monolingual and parallel corpora for Asturian, Aragonese and Aranese.