Felipe Sánchez‐Martínez

Also published as: Felipe Sánchez Martínez, Felipe Sánchez-Martinez, Felipe Sánchez-Martínez


2021

pdf bib
Surprise Language Challenge: Developing a Neural Machine Translation System between Pashto and English in Two Months
Alexandra Birch | Barry Haddow | Antonio Valerio Miceli Barone | Jindrich Helcl | Jonas Waldendorf | Felipe Sánchez Martínez | Mikel Forcada | Víctor Sánchez Cartagena | Juan Antonio Pérez-Ortiz | Miquel Esplà-Gomis | Wilker Aziz | Lina Murady | Sevi Sariisik | Peggy van der Kreeft | Kay Macquarrie
Proceedings of Machine Translation Summit XVIII: Research Track

In the media industry and the focus of global reporting can shift overnight. There is a compelling need to be able to develop new machine translation systems in a short period of time and in order to more efficiently cover quickly developing stories. As part of the EU project GoURMET and which focusses on low-resource machine translation and our media partners selected a surprise language for which a machine translation system had to be built and evaluated in two months(February and March 2021). The language selected was Pashto and an Indo-Iranian language spoken in Afghanistan and Pakistan and India. In this period we completed the full pipeline of development of a neural machine translation system: data crawling and cleaning and aligning and creating test sets and developing and testing models and and delivering them to the user partners. In this paperwe describe rapid data creation and experiments with transfer learning and pretraining for this low-resource language pair. We find that starting from an existing large model pre-trained on 50languages leads to far better BLEU scores than pretraining on one high-resource language pair with a smaller model. We also present human evaluation of our systems and which indicates that the resulting systems perform better than a freely available commercial system when translating from English into Pashto direction and and similarly when translating from Pashto into English.

2020

pdf bib
Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation
Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Felipe Sánchez-Martínez
Proceedings of the 28th International Conference on Computational Linguistics

This paper studies the effects of word-level linguistic annotations in under-resourced neural machine translation, for which there is incomplete evidence in the literature. The study covers eight language pairs, different training corpus sizes, two architectures, and three types of annotation: dummy tags (with no linguistic information at all), part-of-speech tags, and morpho-syntactic description tags, which consist of part of speech and morphological features. These linguistic annotations are interleaved in the input or output streams as a single tag placed before each word. In order to measure the performance under each scenario, we use automatic evaluation metrics and perform automatic error classification. Our experiments show that, in general, source-language annotations are helpful and morpho-syntactic descriptions outperform part of speech for some language pairs. On the contrary, when words are annotated in the target language, part-of-speech tags systematically outperform morpho-syntactic description tags in terms of automatic evaluation metrics, even though the use of morpho-syntactic description tags improves the grammaticality of the output. We provide a detailed analysis of the reasons behind this result.

pdf bib
A multi-source approach for Breton–French hybrid machine translation
Víctor M. Sánchez-Cartagena | Mikel L. Forcada | Felipe Sánchez-Martínez
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

Corpus-based approaches to machine translation (MT) have difficulties when the amount of parallel corpora to use for training is scarce, especially if the languages involved in the translation are highly inflected. This problem can be addressed from different perspectives, including data augmentation, transfer learning, and the use of additional resources, such as those used in rule-based MT. This paper focuses on the hybridisation of rule-based MT and neural MT for the Breton–French under-resourced language pair in an attempt to study to what extent the rule-based MT resources help improve the translation quality of the neural MT system for this particular under-resourced language pair. We combine both translation approaches in a multi-source neural MT architecture and find out that, even though the rule-based system has a low performance according to automatic evaluation metrics, using it leads to improved translation quality.

pdf bib
An English-Swahili parallel corpus and its use for neural machine translation in the news domain
Felipe Sánchez-Martínez | Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Mikel L. Forcada | Miquel Esplà-Gomis | Andrew Secker | Susie Coleman | Julie Wall
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet. We report the results of a pilot human evaluation performed by the news media organisations participating in the H2020 EU-funded project GoURMET.

pdf bib
Bicleaner at WMT 2020: Universitat d’Alacant-Prompsit’s submission to the parallel corpus filtering shared task
Miquel Esplà-Gomis | Víctor M. Sánchez-Cartagena | Jaume Zaragoza-Bernabeu | Felipe Sánchez-Martínez
Proceedings of the Fifth Conference on Machine Translation

This paper describes the joint submission of Universitat d’Alacant and Prompsit Language Engineering to the WMT 2020 shared task on parallel corpus filtering. Our submission, based on the free/open-source tool Bicleaner, enhances it with Extremely Randomised Trees and lexical similarity features that account for the frequency of the words in the parallel sentences to determine if two sentences are parallel. To train this classifier we used the clean corpora provided for the task and synthetic noisy parallel sentences. In addition we re-score the output of Bicleaner using character-level language models and n-gram saturation.

2019

pdf bib
The Universitat d’Alacant Submissions to the English-to-Kazakh News Translation Task at WMT 2019
Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Felipe Sánchez-Martínez
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the two submissions of Universitat d’Alacant to the English-to-Kazakh news translation task at WMT 2019. Our submissions take advantage of monolingual data and parallel data from other language pairs by means of iterative backtranslation, pivot backtranslation and transfer learning. They also use linguistic information in two ways: morphological segmentation of Kazakh text, and integration of the output of a rule-based machine translation system. Our systems were ranked second in terms of chrF++ despite being built from an ensemble of only 2 independent training runs.

pdf bib
Improving Translations by Combining Fuzzy-Match Repair with Automatic Post-Editing
John Ortega | Felipe Sánchez-Martínez | Marco Turchi | Matteo Negri
Proceedings of Machine Translation Summit XVII: Research Track

pdf bib
Global Under-Resourced Media Translation (GoURMET)
Alexandra Birch | Barry Haddow | Ivan Tito | Antonio Valerio Miceli Barone | Rachel Bawden | Felipe Sánchez-Martínez | Mikel L. Forcada | Miquel Esplà-Gomis | Víctor Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Wilker Aziz | Andrew Secker | Peggy van der Kreeft
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

2018

pdf bib
UAlacant machine translation quality estimation at WMT 2018: a simple approach using phrase tables and feed-forward neural networks
Felipe Sánchez-Martínez | Miquel Esplà-Gomis | Mikel L. Forcada
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We describe the Universitat d’Alacant submissions to the word- and sentence-level machine translation (MT) quality estimation (QE) shared task at WMT 2018. Our approach to word-level MT QE builds on previous work to mark the words in the machine-translated sentence as OK or BAD, and is extended to determine if a word or sequence of words need to be inserted in the gap after each word. Our sentence-level submission simply uses the edit operations predicted by the word-level approach to approximate TER. The method presented ranked first in the sub-task of identifying insertions in gaps for three out of the six datasets, and second in the rest of them.

2016

pdf bib
Fuzzy-match repair using black-box machine translation systems: what can be expected?
John Ortega | Felipe Sánchez-Martínez | Mikel Forcada
Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track

Computer-aided translation (CAT) tools often use a translation memory (TM) as the key resource to assist translators. A TM contains translation units (TU) which are made up of source and target language segments; translators use the target segments in the TU suggested by the CAT tool by converting them into the desired translation. Proposals from TMs could be made more useful by using techniques such as fuzzy-match repair (FMR) which modify words in the target segment corresponding to mismatches identified in the source segment. Modifications in the target segment are done by translating the mismatched source sub-segments using an external source of bilingual information (SBI) and applying the translations to the corresponding positions in the target segment. Several combinations of translated sub-segments can be applied to the target segment which can produce multiple repair candidates. We provide a formal algorithmic description of a method that is capable of using any SBI to generate all possible fuzzy-match repairs and perform an oracle evaluation on three different language pairs to ascertain the potential of the method to improve translation productivity. Using DGT-TM translation memories and the machine system Apertium as the single source to build repair operators in three different language pairs, we show that the best repaired fuzzy matches are consistently closer to reference translations than either machine-translated segments or unrepaired fuzzy matches.

pdf bib
UAlacant word-level and phrase-level machine translation quality estimation systems at WMT 2016
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel Forcada
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
UAlacant word-level machine translation quality estimation system at WMT 2015
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel Forcada
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
İlknur Durgar El-Kahlout | Mehmed Özkan | Felipe Sánchez-Martínez | Gema Ramírez-Sánchez | Fred Hollowood | Andy Way
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Using on-line available sources of bilingual information for word-level machine translation quality estimation
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
A general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation
Mikel L. Forcada | Felipe Sánchez-Martínez
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Unsupervised training of maximum-entropy models for lexical selection in rule-based machine translation
Francis M. Tyers | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
İIknur El‐Kahlout | Mehmed Özkan | Felipe Sánchez‐Martínez | Gema Ramírez‐Sánchez | Fred Hollywood | Andy Way
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Using on-line available sources of bilingual information for word-level machine translation quality estimation
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
A general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation
Mikel L. Forcada | Felipe Sánchez-Martínez
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Unsupervised training of maximum-entropy models for lexical selection i in rule-based machine translation
Francis M. Tyers | Felipe Sánchez-Martinez | Mikel L. Forcada
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
Using any machine translation source for fuzzy-match repair in a computer-aided translation setting
John E. Ortega | Felipe Sánchez-Martinez | Mikel L. Forcada
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track

When a computer-assisted translation (CAT) tool does not find an exact match for the source segment to translate in its translation memory (TM), translators must use fuzzy matches that come from translation units in the translation memory that do not completely match the source segment. We explore the use of a fuzzy-match repair technique called patching to repair translation proposals from a TM in a CAT environment using any available machine translation system, or any external bilingual source, regardless of its internals. Patching attempts to aid CAT tool users by repairing fuzzy matches and proposing improved translations. Our results show that patching improves the quality of translation proposals and reduces the amount of edit operations to perform, especially when a specific set of restrictions is applied.

pdf bib
Abu-MaTran at WMT 2014 Translation Task: Two-step Data Selection and RBMT-Style Synthetic Rules
Raphael Rubino | Antonio Toral | Victor M. Sánchez-Cartagena | Jorge Ferrández-Tordera | Sergio Ortiz-Rojas | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez | Andy Way
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
The UA-Prompsit hybrid machine translation system for the 2014 Workshop on Statistical Machine Translation
Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Felipe Sánchez-Martínez
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
An efficient method to assist non-expert users in extending dictionaries by assigning stems and inflectional paradigms to unknknown words
Miquel Esplà-Gomis | Víctor M. Sánchez-Cartegna | Felipe Sánchez-Martínez | Rafael C. Carrasco | Mikel L. Forcada | Juan Antonio Pérez-Ortiz
Proceedings of the 17th Annual conference of the European Association for Machine Translation

2012

pdf bib
UAlacant: Using Online Machine Translation for Cross-Lingual Textual Entailment
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel L. Forcada
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Flexible finite-state lexical selection for rule-based machine translation
Francis M. Tyers | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 16th Annual conference of the European Association for Machine Translation

2011

pdf bib
The Universitat d’Alacant hybrid machine translation system for WMT 2011
Víctor M. Sánchez-Cartagena | Felipe Sánchez-Martínez | Juan Antonio Pérez-Ortiz
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Using word alignments to assist computer-aided translation users by marking which target-side words to change or keep unedited
Miquel Esplà | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Choosing the best machine translation system to translate a sentence by using only source-language information
Felipe Sánchez-Martínez
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Using machine translation in computer-aided translation to suggest the target-side words to change
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Integrating shallow-transfer rules into phrase-based statistical machine translation
Víctor M. Sánchez-Cartagena | Felipe Sánchez-Martínez | Juan Antonio Pérez-Ortiz
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation
Felipe Sánchez-Martinez | Juan Antonio Pérez-Ortiz
Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation

pdf bib
Enriching a statistical machine translation system trained on small parallel corpora with rule-based bilingual phrases
Víctor M. Sánchez-Cartagena | Felipe Sánchez-Martínez | Juan Antonio Pérez-Ortiz
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2009

pdf bib
Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation
Juan Antonio Pérez-Ortiz | Felipe Sánchez-Martinez | Francis M. Tyers
Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation

pdf bib
A trigram part-of-speech tagger for the Apertium free/open-source machine translation platform
Zaid Md Abdul Wahab Sheikh | Felipe Sánchez-Martínez
Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation

pdf bib
Marker-Based Filtering of Bilingual Phrase Pairs for SMT
Felipe Sánchez-Martínez | Andy Way
Proceedings of the 13th Annual conference of the European Association for Machine Translation

2007

pdf bib
Automatic induction of shallow-transfer rules for open-source machine translation
Felipe Sánchez-Martínez | Mikel L. Forcada
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2005

pdf bib
An Open-Source Shallow-Transfer Machine Translation Toolbox: Consequences of Its Release and Availability
Carme Armentano-Oller | Antonio M. Corbí-Bellot | Mikel L. Forcada | Mireia Ginestí-Rosell | Boyan Bonev | Sergio Ortiz-Rojas | Juan Antonio Pérez-Ortiz | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez
Workshop on open-source machine translation

By the time Machine Translation Summit X is held in September 2005, our group will have released an open-source machine translation toolbox as part of a large government-funded project involving four universities and three linguistic technology companies from Spain. The machine translation toolbox, which will most likely be released under a GPL-like license includes (a) the open-source engine itself, a modular shallow-transfer machine translation engine suitable for related languages and largely based upon that of systems we have already developed, such as interNOSTRUM for Spanish—Catalan and Traductor Universia for Spanish—Portuguese, (b) extensive documentation (including document type declarations) specifying the XML format of all linguistic (dictionaries, rules) and document format management files, (c) compilers converting these data into the high-speed (tens of thousands of words a second) format used by the engine, and (d) pilot linguistic data for Spanish—Catalan and Spanish—Galician and format management specifications for the HTML, RTF and plain text formats. After describing very briefly this toolbox, this paper aims at exploring possible consequences of the availability of this architecture, including the community-driven development of machine translation systems for languages lacking this kind of linguistic technology.

pdf bib
An open-source shallow-transfer machine translation engine for the Romance languages of Spain
Antonio M. Corbi-Bellot | Mikel L. Forcada | Sergio Ortíz-Rojas | Juan Antonio Pérez-Ortiz | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez | Iñaki Alegria | Aingeru Mayor | Kepa Sarasola
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

pdf bib
Cooperative unsupervised training of the part-of-speech taggers in a bidirectional machine translation system
Felipe Sánchez-Martínez | Juan Antonio Pérez-Ortiz | Mikel L. Forcada
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages