Nuno Rendeiro

2016

Use of Domain-Specific Language Resources in Machine Translation
Sanja Štajner | Andreia Querido | Nuno Rendeiro | João António Rodrigues | António Branco
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we address the problem of Machine Translation (MT) for a specialised domain in a language pair for which only a very small domain-specific parallel corpus is available. We conduct a series of experiments using a purely phrase-based SMT (PBSMT) system and a hybrid MT system (TectoMT), testing three different strategies to overcome the problem of the small amount of in-domain training data. Our results show that adding a small size in-domain bilingual terminology to the small in-domain training corpus leads to the best improvements of a hybrid MT system, while the PBSMT system achieves the best results by adding a combination of in-domain bilingual terminology and a larger out-of-domain corpus. We focus on qualitative human evaluation of the output of two best systems (one for each approach) and perform a systematic in-depth error analysis which revealed advantages of the hybrid MT system over the pure PBSMT system for this specific task.

pdf bib abs

Bootstrapping a Hybrid MT System to a New Language Pair
João António Rodrigues | Nuno Rendeiro | Andreia Querido | Sanja Štajner | António Branco
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The usual concern when opting for a rule-based or a hybrid machine translation (MT) system is how much effort is required to adapt the system to a different language pair or a new domain. In this paper, we describe a way of adapting an existing hybrid MT system to a new language pair, and show that such a system can outperform a standard phrase-based statistical machine translation system with an average of 10 persons/month of work. This is specifically important in the case of domain-specific MT for which there is not enough parallel data for training a statistical machine translation system.

pdf bib

Co-authors

Venues

LREC2
WMT1

Fix author