NICE: Neural Integrated Custom Engines
Daniel Marín Buj
Daniel Ibáñez García
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
In this paper, we present a machine translation system implemented by the Translation Centre for the Bodies of the European Union (CdT). The main goal of this project is to create domain-specific machine translation engines in order to support machine translation services and applications to the Translation Centre’s clients. In this article, we explain the entire implementation process of NICE: Neural Integrated Custom Engines. We describe the problems identified and the solutions provided, and present the final results for different language pairs. Finally, we describe the work that will be done on this project in the future.
Filtering of Noisy Parallel Corpora Based on Hypothesis Generation
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
The filtering task of noisy parallel corpora in WMT2019 aims to challenge participants to create filtering methods to be useful for training machine translation systems. In this work, we introduce a noisy parallel corpora filtering system based on generating hypotheses by means of a translation model. We train translation models in both language pairs: Nepali–English and Sinhala–English using provided parallel corpora. We select the training subset for three language pairs (Nepali, Sinhala and Hindi to English) jointly using bilingual cross-entropy selection to create the best possible translation model for both language pairs. Once the translation models are trained, we translate the noisy corpora and generate a hypothesis for each sentence pair. We compute the smoothed BLEU score between the target sentence and generated hypothesis. In addition, we apply several rules to discard very noisy or inadequate sentences which can lower the translation score. These heuristics are based on sentence length, source and target similarity and source language detection. We compare our results with the baseline published on the shared task website, which uses the Zipporah model, over which we achieve significant improvements in one of the conditions in the shared task. The designed filtering system is domain independent and all experiments are conducted using neural machine translation.