Jesús González-Rubio

Also published as: Jesús González Rubio, Jesus Gonzalez-Rubio

2019

pdf bib abs
Webinterpret Submission to the WMT2019 Shared Task on Parallel Corpus Filtering
Jesús González-Rubio
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

This document describes the participation of Webinterpret in the shared task on parallel corpus filtering at the Fourth Conference on Machine Translation (WMT 2019). Here, we describe the main characteristics of our approach and discuss the results obtained on the data sets published for the shared task.

2018

pdf bib abs
MAJE Submission to the WMT2018 Shared Task on Parallel Corpus Filtering
Marina Fomicheva | Jesús González-Rubio
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the participation of Webinterpret in the shared task on parallel corpus filtering at the Third Conference on Machine Translation (WMT 2018). The paper describes the main characteristics of our approach and discusses the results obtained on the data sets published for the shared task.

2016

pdf bib
Beyond Prefix-Based Interactive Translation Prediction
Jesús González-Rubio | Daniel Ortiz-Martínez | Francisco Casacuberta | José Miguel Benedi Ruiz
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

2014

This paper describes a pilot study with a computed-assisted translation workbench aiming at testing the integration of online and active learning features. We investigate the effect of these features on translation productivity, using interactive translation prediction (ITP) as a baseline. User activity data were collected from five beta testers using key-logging and eye-tracking. User feedback was also collected at the end of the experiments in the form of retrospective think-aloud protocols. We found that OL performs better than ITP, especially in terms of translation speed. In addition, AL provides better translation quality than ITP for the same levels of user effort. We plan to incorporate these features in the final version of the workbench.

pdf bib
Inference of Phrase-Based Translation Models via Minimum Description Length
Jesús González-Rubio | Francisco Casacuberta
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

This paper describes the field trial and subsequent evaluation of a post-editing workbench which is currently under development in the EU-funded CasMaCat project. Based on user evaluations of the initial prototype of the workbench, this second prototype of the workbench includes a number of interactive features designed to improve productivity and user satisfaction. Using CasMaCat’s own facilities for logging keystrokes and eye tracking, data were collected from nine post-editors in a professional setting. These data were then used to investigate the effects of the interactive features on productivity, quality, user satisfaction and cognitive load as reflected in the post-editors gaze activity. These quantitative results are combined with the qualitative results derived from user questionnaires and interviews conducted with all the participants.

pdf bib
FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task
José Guilherme Camargo de Souza | Jesús González-Rubio | Christian Buck | Marco Turchi | Matteo Negri
Proceedings of the Ninth Workshop on Statistical Machine Translation

2013

pdf bib abs
Improving the minimum Bayes’ risk combination of machine translation systems
Jesús González-Rubio | Francisco Casacuberta
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

We investigate the problem of combining the outputs of different translation systems into a minimum Bayes’ risk consensus translation. We explore different risk formulations based on the BLEU score, and provide a dynamic programming decoding algorithm for each of them. In our experiments, these algorithms generated consensus translations with better risk, and more efficiently, than previous proposals.

pdf bib abs
Emprical study of a two-step approach to estimate translation quality
Jesús González-Rubio | J. Ramón Navarro-Cerdán | Francisco Casacuberta
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

We present a method to estimate the quality of automatic translations when reference translations are not available. Quality estimation is addressed as a two-step regression problem where multiple features are combined to predict a quality score. Given a set of features, we aim at automatically extracting the variables that better explain translation quality, and use them to predict the quality score. The soundness of our approach is assessed by the encouraging results obtained in an exhaustive experimentation with several feature sets. Moreover, the studied approach is highly-scalable allowing us to employ hundreds of features to predict translation quality.

pdf bib
Interactive Machine Translation using Hierarchical Translation Models
Jesús González-Rubio | Daniel Ortiz-Martínez | José-Miguel Benedí | Francisco Casacuberta
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

This paper presents the submissions of the PRHLT group for the evaluation campaign of the International Workshop on Spoken Language Translation. We focus on the development of reliable translation systems between syntactically different languages (DIALOG task) and on the efficient training of SMT models in resource-rich scenarios (TALK task).

pdf bib abs
Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT
Jesús González-Rubio | Jorge Civera | Alfons Juan | Francisco Casacuberta
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Currently, a great effort is being carried out in the digitalisation of large historical document collections for preservation purposes. The documents in these collections are usually written in ancient languages, such as Latin or Greek, which limits the access of the general public to their content due to the language barrier. Therefore, digital libraries aim not only at storing raw images of digitalised documents, but also to annotate them with their corresponding text transcriptions and translations into modern languages. Unfortunately, ancient languages have at their disposal scarce electronic resources to be exploited by natural language processing techniques. This paper describes the compilation process of a novel Latin-Catalan parallel corpus as a new task for statistical machine translation (SMT). Preliminary experimental results are also reported using a state-of-the-art phrase-based SMT system. The results presented in this work reveal the complexity of the task and its challenging, but interesting nature for future development.

pdf bib
Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
Jesús González-Rubio | Daniel Ortiz-Martínez | Francisco Casacuberta
Proceedings of the ACL 2010 Conference Short Papers