Ismael García-Varea

Also published as: Ismael García Varea, Ismael García Varea, Ismael Garcia-Varea

In this paper, we present the Thot toolkit, a set of tools to train phrase-based models for statistical machine translation, which is publicly available as open source software. The toolkit obtains phrase-based models from word-based alignment models; to our knowledge, this functionality has not been offered by any publicly available toolkit. The Thot toolkit also implements a new way for estimating phrase models, this allows to obtain more complete phrase models than the methods described in the literature, including a segmentation length submodel. The toolkit output can be given in different formats in order to be used by other statistical machine translation tools like Pharaoh, which is a beam search decoder for phrase-based alignment models which was used in order to perform translation experiments with the generated models. Additionally, the Thot toolkit can be used to obtain the best alignment between a sentence pair at phrase level.

2003

pdf bib abs

On the use of statistical machine-translation techniques within a memory-based translation system (AMETRA)
Daniel Ortíz | Ismael García-Varea | Francisco Casacuberta | Antonio Lagarda | Jorge González
Proceedings of Machine Translation Summit IX: Papers

The goal of the AMETRA project is to make a computer-assisted translation tool from the Spanish language to the Basque language under the memory-based translation framework. The system is based on a large collection of bilingual word-segments. These segments are obtained using linguistic or statistical techniques from a Spanish-Basque bilingual corpus consisting of sentences extracted from the Basque Country’s of£cial government record. One of the tasks within the global information document of the AMETRA project is to study the combination of well-known statistical techniques for the translation of short sequences and techniques for memory-based translation. In this paper, we address the problem of constructing a statistical module to deal with the task of translating segments. The task undertaken in the AMETRA project is compared with other existing translation tasks, This study includes the results of some preliminary experiments we have carried out using well-known statistical machine translation tools and techniques.

2002

pdf bib

Improving Alignment Quality in Statistical Machine Translation Using Context-dependent Maximum Entropy Models
Ismael García Varea | Franz J. Och | Hermann Ney | Francisco Casacuberta
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib abs

Efficient integration of maximum entropy lexicon models within the training of statistical alignment models
Ismael García-Varea | Franz J. Och | Hermann Ney | Francisco Casacuberta
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

Maximum entropy (ME) models have been successfully applied to many natural language problems. In this paper, we show how to integrate ME models efficiently within a maximum likelihood training scheme of statistical machine translation models. Specifically, we define a set of context-dependent ME lexicon models and we present how to perform an efficient training of these ME models within the conventional expectation-maximization (EM) training of statistical translation models. Experimental results are also given in order to demonstrate how these ME models improve the results obtained with the traditional translation models. The results are presented by means of alignment quality comparing the resulting alignments with manually annotated reference alignments.

2001

pdf bib

Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach
Ismael García-Varea | Franz J. Och | Hermann Ney | Francisco Casacuberta
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

pdf bib abs

Search algorithms for statistical machine translation based on dynamic programming and pruning techniques
Ismael García-Varea | Francisco Casacuberta
Proceedings of Machine Translation Summit VIII

The increasing interest in the statistical approach to Machine Translation is due to the development of effective algorithms for training the probabilistic models proposed so far. However, one of the open problems with statistical machine translation is the design of efficient algorithms for translating a given input string. For some interesting models, only (good) approximate solutions can be found. Recently, a dynamic programming-like algorithm for the IBM-Model 2 has been proposed which is based on an iterative process of refinement solutions. A new dynamic programming-like algorithm is proposed here to deal with more complex IBM models (models 3 to 5). The computational cost of the algorithm is reduced by using an alignment-based pruning technique. Experimental results with the so-called “Tourist Task” are also presented.