José B. Mariño

Also published as: José Mariño, Jose B. Marino

2013

The TALP-UPC Phrase-Based Translation Systems for WMT13: System Combination with Morphology Generation, Domain Adaptation and Corpus Filtering
Lluís Formiga | Marta R. Costa-jussà | José B. Mariño | José A. R. Fonollosa | Alberto Barrón-Cedeño | Lluís Màrquez
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib

The TALP-UPC phrase-based translation systems for WMT12: Morphology simplification and domain adaptation
Lluís Formiga | Carlos A. Henríquez Q. | Adolfo Hernández | José B. Mariño | Enric Monte | José A. R. Fonollosa
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib abs

Improving English to Spanish Out-of-Domain Translations by Morphology Generalization and Generation
Lluís Formiga | Adolfo Hernández | José B. Mariño | Enric Monte
Workshop on Monolingual Machine Translation

This paper presents a detailed study of a method for morphology generalization and generation to address out-of-domain translations in English-to-Spanish phrase-based MT. The paper studies whether the morphological richness of the target language causes poor quality translation when translating out-of-domain. In detail, this approach first translates into Spanish simplified forms and then predicts the final inflected forms through a morphology generation step based on shallow and deep-projected linguistic information available from both the source and target-language sentences. Obtained results highlight the importance of generalization, and therefore generation, for dealing with out-of-domain data.

2011

pdf bib abs

Deriving translation units using small additional corpora
Carlos A. Henríquez Q. | José B. Mariño | Rafael E. Banchs
Proceedings of the 15th Annual Conference of the European Association for Machine Translation

We present a novel strategy to derive new translation units using an additional bilingual corpus and a previously trained SMT system. The units were used to adapt the SMT system. The derivation process can be applied when the additional corpus is very small compared with the original train corpus and it does not require to compute new word alignments using all corpora. The strategy is based in the Levenshtein Distance and its resulting path. We reported a statistically significant improvement, with a confidence level of 99%, when adapting an Ngram-based Catalan-Spanish system using an additional corpus that represents less than 0.5% of the original train corpus. The additional translation units were able to solve morphological and lexical errors and added previously unknown words to the vocabulary.

bib

2010

pdf bib abs

Automatic and Human Evaluation Study of a Rule-based and a Statistical Catalan-Spanish Machine Translation Systems
Marta R. Costa-jussà | Mireia Farrús | José B. Mariño | José A. R. Fonollosa
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Machine translation systems can be classified into rule-based and corpus-based approaches, in terms of their core technology. Since both paradigms have largely been used during the last years, one of the aims in the research community is to know how these systems differ in terms of translation quality. To this end, this paper reports a study and comparison of a rule-based and a corpus-based (particularly, statistical) Catalan-Spanish machine translation systems, both of them freely available in the web. The translation quality analysis is performed under two different domains: journalistic and medical. The systems are evaluated by using standard automatic measures, as well as by native human evaluators. Automatic results show that the statistical system performs better than the rule-based system. Human judgements show that in the Spanish-to-Catalan direction the statistical system also performs better than the rule-based system, while in the Catalan-to-Spanish direction is the other way round. Although the statistical system obtains the best automatic scores, its errors tend to be more penalized by human judgements than the errors of the rule-based system. This can be explained because statistical errors are usually unexpected and they do not follow any pattern.

pdf bib

Linguistic-based Evaluation Criteria to identify Statistical Machine Translation Errors
Mireia Farrús | Marta R. Costa-jussà | José B. Mariño | José A. R. Fonollosa
Proceedings of the 14th Annual Conference of the European Association for Machine Translation

pdf bib abs

UPC-BMIC-VDU system description for the IWSLT 2010: testing several collocation segmentations in a phrase-based SMT system
Carlos Henríquez | Marta R. Costa-jussà | Vidas Daudaravicius | Rafael E. Banchs | José B. Mariño
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the UPC-BMIC-VMU participation in the IWSLT 2010 evaluation campaign. The SMT system is a standard phrase-based enriched with novel segmentations. These novel segmentations are computed using statistical measures such as Log-likelihood, T-score, Chi-squared, Dice, Mutual Information or Gravity-Counts. The analysis of translation results allows to divide measures into three groups. First, Log-likelihood, Chi-squared and T-score tend to combine high frequency words and collocation segments are very short. They improve the SMT system by adding new translation units. Second, Mutual Information and Dice tend to combine low frequency words and collocation segments are short. They improve the SMT system by smoothing the translation units. And third, GravityCounts tends to combine high and low frequency words and collocation segments are long. However, in this case, the SMT system is not improved. Thus, the road-map for translation system improvement is to introduce new phrases with either low frequency or high frequency words. It is hard to introduce new phrases with low and high frequency words in order to improve translation quality. Experimental results are reported in the French-to-English IWSLT 2010 evaluation where our system was ranked 3rd out of nine systems.

pdf bib

Using Collocation Segmentation to Augment the Phrase Table
Carlos A. Henríquez Q. | Marta Ruiz Costa-jussà | Vidas Daudaravicius | Rafael E. Banchs | José B. Mariño
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2009

pdf bib

Improving a Catalan-Spanish Statistical Translation System using Morphosyntactic Knowledge
Mireia Farrús | Marta R. Costa-jussà | Marc Poch | Adolfo Hernández | José B. Mariño
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

pdf bib

2008

pdf bib

pdf bib abs

This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polite`cnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks.

2007

pdf bib

pdf bib abs

This paper describes TALPtuples, the 2007 N-gram-based statistical machine translation system developed at the TALP Research Center of the UPC (Universitat Polite`cnica de Catalunya) in Barcelona. Emphasis is put on improvements and extensions of the system of previous years. Mainly, these include optimizing alignment parameters in function of translation metric scores and rescoring with a neural network language model. Results on two translation directions are reported, namely from Arabic and Chinese into English, thoroughly explaining all language-related preprocessing and translation schemes.

pdf bib

Extending MARIE: an N-gram-based SMT decoder
Josep M. Crego | José B. Mariño
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib

Syntax-enhanced n-gram-based SMT
Josep M. Crego | José B. Mariño
Proceedings of Machine Translation Summit XI: Papers

pdf bib

Analysis and System Combination of Phrase- and N-Gram-Based Statistical Machine Translation Systems
Marta R. Costa-jussà | Josep M. Crego | David Vilar | José A. R. Fonollosa | José B. Mariño | Hermann Ney
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

2006

pdf bib abs

Integration of POStag-based Source Reordering into SMT Decoding by an Extended Search Graph
Josep M. Crego | José B. Mariño
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper presents a reordering framework for statistical machine translation (SMT) where source-side reorderings are integrated into SMT decoding, allowing for a highly constrained reordered search graph. The monotone search is extended by means of a set of reordering patterns (linguistically motivated rewrite patterns). Patterns are automatically learnt in training from word-to-word alignments and source-side Part-Of-Speech (POS) tags. Traversing the extended search graph, the decoder evaluates every hypothesis making use of a group of widely used SMT models and helped by an additional Ngram language model of source-side POS tags. Experiments are reported on the Euparl task (Spanish-to-English and English-to- Spanish). Results are presented regarding translation accuracy (using human and automatic evaluations) and computational efficiency, showing significant improvements in translation quality for both translation directions at a very low computational cost.

2005

pdf bib

Statistical Machine Translation of Euparl Data by using Bilingual N-grams
Rafael E. Banchs | Josep M. Crego | Adrià de Gispert | Patrik Lambert | José B. Mariño
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf bib

Ngram-based versus Phrase-based Statistical Machine Translation
Josep M. Crego | Marta R. Costa-Jussa | Jose B. Marino | Jose A. R. Fonollosa
Proceedings of the Second International Workshop on Spoken Language Translation

pdf bib

The TALP Ngram-based SMT System for IWSLT’05
Josep M. Crego | Adria de Gispert | Jose B. Marino
Proceedings of the Second International Workshop on Spoken Language Translation

pdf bib abs

This paper describes a statistical machine translation system that uses a translation model which is based on bilingual n-grams. When this translation model is log-linearly combined with four specific feature functions, state of the art translations are achieved for Spanish-to-English and English-to-Spanish translation tasks. Some specific results obtained for the EPPS (European Parliament Plenary Sessions) data are presented and discussed. Finally, future research issues are depicted.

pdf bib abs

Reordered Search, and Tuple Unfolding for Ngram-based SMT
Josep M. Crego | José B. Mariño | Adrià de Gispert
Proceedings of Machine Translation Summit X: Papers

In Statistical Machine Translation, the use of reordering for certain language pairs can produce a significant improvement on translation accuracy. However, the search problem is shown to be NP-hard when arbitrary reorderings are allowed. This paper addresses the question of reordering for an Ngram-based SMT approach following two complementary strategies, namely reordered search and tuple unfolding. These strategies interact to improve translation quality in a Chinese to English task. On the one hand, we allow for an Ngram-based decoder (MARIE) to perform a reordered search over the source sentence, while combining a translation tuples Ngram model, a target language model, a word penalty and a word distance model. Interestingly, even though the translation units are learnt sequentially, its reordered search produces an improved translation. On the other hand, we allow for a modification of the translation units that unfolds the tuples, so that shorter units are learnt from a new parallel corpus, where the source sentences are reordered according to the target language. This tuple unfolding technique reduces data sparseness and, when combined with the reordered search, further boosts translation performance. Translation accuracy and efficency results are reported for the IWSLT 2004 Chinese to English task.

José B. Mariño

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2002

Co-authors

Venues