Celia Soler Uguet


2022

pdf bib
A Comparison of Data Filtering Methods for Neural Machine Translation
Fred Bane | Celia Soler Uguet | Wiktor Stribiżew | Anna Zaretskaya
Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)

With the increasing availability of large-scale parallel corpora derived from web crawling and bilingual text mining, data filtering is becoming an increasingly important step in neural machine translation (NMT) pipelines. This paper applies several available tools to the task of data filtration, and compares their performance in filtering out different types of noisy data. We also study the effect of filtration with each tool on model performance in the downstream task of NMT by creating a dataset containing a combination of clean and noisy data, filtering the data with each tool, and training NMT engines using the resulting filtered corpora. We evaluate the performance of each engine with a combination of direct assessment (DA) and automated metrics. Our best results are obtained by training for a short time on all available data then filtering the corpus with cross-entropy filtering and training until convergence.

pdf bib
Comparing Multilingual NMT Models and Pivoting
Celia Soler Uguet | Fred Bane | Anna Zaretskaya | Tània Blanch Miró
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

Following recent advancements in multilingual machine translation at scale, our team carried out tests to compare the performance of multilingual models (M2M from Facebook and multilingual models from Helsinki-NLP) with a two-step translation process using English as a pivot language. Direct assessment by linguists rated translations produced by pivoting as consistently better than those obtained from multilingual models of similar size, while automated evaluation with COMET suggested relative performance was strongly impacted by domain and language family.