Manuel Herranz
2022
English-Russian Data Augmentation for Neural Machine Translation
Nikita Teslenko Grygoryev | Mercedes Garcia Martinez | Francisco Casacuberta Nolla | Amando Estela Pastor | Manuel Herranz
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Workshop 2: Corpus Generation and Corpus Augmentation for Machine Translation)
Nikita Teslenko Grygoryev | Mercedes Garcia Martinez | Francisco Casacuberta Nolla | Amando Estela Pastor | Manuel Herranz
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Workshop 2: Corpus Generation and Corpus Augmentation for Machine Translation)
Data Augmentation (DA) refers to strategies for increasing the diversity of training examples without explicitly collecting new data manually. We have used neural networks and linguistic resources for the automatic generation of text in Russian. The system generates new texts using information from embeddings trained with a huge amount of data in neural language models. Data from the public domain have been used for experiments. The generation of these texts increases the corpus used to train models for NLP tasks, such as machine translation. Finally, an analysis of the results obtained evaluating the quality of generated texts has been carried out and those texts have been added to the training process of Neural Machine Translation (NMT) models. In order to evaluate the quality of the NMT models, firstly, these models have been compared performing a quantitative analysis by means of several standard automatic metrics used in machine translation, and measuring the time spent and the amount of text generated for a good use in the language industry. Secondly, NMT models have been compared through a qualitative analysis, where generated examples of translation have been exposed and compared with each other. Using our DA method, we achieve better results than a baseline model by fine tuning NMT systems with the newly generated datasets.
Europeana Translate: Providing multilingual access to digital cultural heritage
Eirini Kaldeli | Mercedes García-Martínez | Antoine Isaac | Paolo Sebastiano Scalia | Arne Stabenau | Iván Lena Almor | Carmen Grau Lacal | Martín Barroso Ordóñez | Amando Estela | Manuel Herranz
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
Eirini Kaldeli | Mercedes García-Martínez | Antoine Isaac | Paolo Sebastiano Scalia | Arne Stabenau | Iván Lena Almor | Carmen Grau Lacal | Martín Barroso Ordóñez | Amando Estela | Manuel Herranz
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
Europeana Translate is a project funded under the Connecting European Facility with the objective to take advantage of state-of-the-art machine translation in order to increase the multilinguality of resources in the cultural heritage domain
MAPA Project: Ready-to-Go Open-Source Datasets and Deep Learning Technology to Remove Identifying Information from Text Documents
Victoria Arranz | Khalid Choukri | Montse Cuadros | Aitor García Pablos | Lucie Gianola | Cyril Grouin | Manuel Herranz | Patrick Paroubek | Pierre Zweigenbaum
Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference
Victoria Arranz | Khalid Choukri | Montse Cuadros | Aitor García Pablos | Lucie Gianola | Cyril Grouin | Manuel Herranz | Patrick Paroubek | Pierre Zweigenbaum
Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference
This paper presents the outcomes of the MAPA project, a set of annotated corpora for 24 languages of the European Union and an open-source customisable toolkit able to detect and substitute sensitive information in text documents from any domain, using state-of-the art, deep learning-based named entity recognition techniques. In the context of the project, the toolkit has been developed and tested on administrative, legal and medical documents, obtaining state-of-the-art results. As a result of the project, 24 dataset packages have been released and the de-identification toolkit is available as open source.
2021
Neural Translation for European Union (NTEU)
Mercedes García-Martínez | Laurent Bié | Aleix Cerdà | Amando Estela | Manuel Herranz | Rihards Krišlauks | Maite Melero | Tony O’Dowd | Sinead O’Gorman | Marcis Pinnis | Artūrs Stafanovič | Riccardo Superbo | Artūrs Vasiļevskis
Proceedings of Machine Translation Summit XVIII: Users and Providers Track
Mercedes García-Martínez | Laurent Bié | Aleix Cerdà | Amando Estela | Manuel Herranz | Rihards Krišlauks | Maite Melero | Tony O’Dowd | Sinead O’Gorman | Marcis Pinnis | Artūrs Stafanovič | Riccardo Superbo | Artūrs Vasiļevskis
Proceedings of Machine Translation Summit XVIII: Users and Providers Track
The Neural Translation for the European Union (NTEU) engine farm enables direct machine translation for all 24 official languages of the European Union without the necessity to use a high-resourced language as a pivot. This amounts to a total of 552 translation engines for all combinations of the 24 languages. We have collected parallel data for all the language combinations publickly shared in elrc-share.eu. The translation engines have been customized to domain,for the use of the European public administrations. The delivered engines will be published in the European Language Grid. In addition to the usual automatic metrics, all the engines have been evaluated by humans based on the direct assessment methodology. For this purpose, we built an open-source platform called MTET The evaluation shows that most of the engines reach high quality and get better scores compared to an external machine translation service in a blind evaluation setup.
2020
A User Study of the Incremental Learning in NMT
Miguel Domingo | Mercedes García-Martínez | Álvaro Peris | Alexandre Helle | Amando Estela | Laurent Bié | Francisco Casacuberta | Manuel Herranz
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
Miguel Domingo | Mercedes García-Martínez | Álvaro Peris | Alexandre Helle | Amando Estela | Laurent Bié | Francisco Casacuberta | Manuel Herranz
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
In the translation industry, human experts usually supervise and post-edit machine translation hypotheses. Adaptive neural machine translation systems, able to incrementally update the underlying models under an online learning regime, have been proven to be useful to improve the efficiency of this workflow. However, this incremental adaptation is somewhat unstable, and it may lead to undesirable side effects. One of them is the sporadic appearance of made-up words, as a byproduct of an erroneous application of subword segmentation techniques. In this work, we extend previous studies on on-the-fly adaptation of neural machine translation systems. We perform a user study involving professional, experienced post-editors, delving deeper on the aforementioned problems. Results show that adaptive systems were able to learn how to generate the correct translation for task-specific terms, resulting in an improvement of the user’s productivity. We also observed a close similitude, in terms of morphology, between made-up words and the words that were expected.
The Multilingual Anonymisation Toolkit for Public Administrations (MAPA) Project
Ēriks Ajausks | Victoria Arranz | Laurent Bié | Aleix Cerdà-i-Cucó | Khalid Choukri | Montse Cuadros | Hans Degroote | Amando Estela | Thierry Etchegoyhen | Mercedes García-Martínez | Aitor García-Pablos | Manuel Herranz | Alejandro Kohan | Maite Melero | Mike Rosner | Roberts Rozis | Patrick Paroubek | Artūrs Vasiļevskis | Pierre Zweigenbaum
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
Ēriks Ajausks | Victoria Arranz | Laurent Bié | Aleix Cerdà-i-Cucó | Khalid Choukri | Montse Cuadros | Hans Degroote | Amando Estela | Thierry Etchegoyhen | Mercedes García-Martínez | Aitor García-Pablos | Manuel Herranz | Alejandro Kohan | Maite Melero | Mike Rosner | Roberts Rozis | Patrick Paroubek | Artūrs Vasiļevskis | Pierre Zweigenbaum
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
We describe the MAPA project, funded under the Connecting Europe Facility programme, whose goal is the development of an open-source de-identification toolkit for all official European Union languages. It will be developed since January 2020 until December 2021.
Neural Translation for the European Union (NTEU) Project
Laurent Bié | Aleix Cerdà-i-Cucó | Hans Degroote | Amando Estela | Mercedes García-Martínez | Manuel Herranz | Alejandro Kohan | Maite Melero | Tony O’Dowd | Sinéad O’Gorman | Mārcis Pinnis | Roberts Rozis | Riccardo Superbo | Artūrs Vasiļevskis
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
Laurent Bié | Aleix Cerdà-i-Cucó | Hans Degroote | Amando Estela | Mercedes García-Martínez | Manuel Herranz | Alejandro Kohan | Maite Melero | Tony O’Dowd | Sinéad O’Gorman | Mārcis Pinnis | Roberts Rozis | Riccardo Superbo | Artūrs Vasiļevskis
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
The Neural Translation for the European Union (NTEU) project aims to build a neural engine farm with all European official language combinations for eTranslation, without the necessity to use a high-resourced language as a pivot. NTEU started in September 2019 and will run until August 2021.
Eco.pangeamt: Industrializing Neural MT
Mercedes García-Martínez | Manuel Herranz | Amando Estela | Ángela Franco | Laurent Bié
Proceedings of the 1st International Workshop on Language Technology Platforms
Mercedes García-Martínez | Manuel Herranz | Amando Estela | Ángela Franco | Laurent Bié
Proceedings of the 1st International Workshop on Language Technology Platforms
Eco is Pangeanic’s customer portal for generic or specialized translation services (machine translation and post-editing, generic API MT and custom API MT). Users can request the processing (translation) of files in different formats. Moreover, a client user can manage the engines and models allowing their cloning and retraining.
2019
NEC TM Data Project
Alexandre Helle | Manuel Herranz
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks
Alexandre Helle | Manuel Herranz
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks
iADAATPA Project: Pangeanic use cases
Mercedes García-Martínez | Amando Estela | Laurent Bié | Alexandre Helle | Manuel Herranz
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks
Mercedes García-Martínez | Amando Estela | Laurent Bié | Alexandre Helle | Manuel Herranz
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks
Large-scale Machine Translation Evaluation of the iADAATPA Project
Sheila Castilho | Natália Resende | Federico Gaspari | Andy Way | Tony O’Dowd | Marek Mazur | Manuel Herranz | Alex Helle | Gema Ramírez-Sánchez | Víctor Sánchez-Cartagena | Mārcis Pinnis | Valters Šics
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks
Sheila Castilho | Natália Resende | Federico Gaspari | Andy Way | Tony O’Dowd | Marek Mazur | Manuel Herranz | Alex Helle | Gema Ramírez-Sánchez | Víctor Sánchez-Cartagena | Mārcis Pinnis | Valters Šics
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks
Incremental Adaptation of NMT for Professional Post-editors: A User Study
Miguel Domingo | Mercedes García-Martínez | Álvaro Peris | Alexandre Helle | Amando Estela | Laurent Bié | Francisco Casacuberta | Manuel Herranz
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks
Miguel Domingo | Mercedes García-Martínez | Álvaro Peris | Alexandre Helle | Amando Estela | Laurent Bié | Francisco Casacuberta | Manuel Herranz
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks
2016
PangeaMT v 3 – customise your own machine translation environment
Alexandre Helle | Manuel Herranz
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products
Alexandre Helle | Manuel Herranz
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products
2015
The EXPERT project: Advancing the state of the art in hybrid translation technologies
Constantin Orasan | Alessandro Cattelan | Gloria Corpas Pastor | Josef van Genabith | Manuel Herranz | Juan José Arevalillo | Qun Liu | Khalil Sima’an | Lucia Specia
Proceedings of Translating and the Computer 37
Constantin Orasan | Alessandro Cattelan | Gloria Corpas Pastor | Josef van Genabith | Manuel Herranz | Juan José Arevalillo | Qun Liu | Khalil Sima’an | Lucia Specia
Proceedings of Translating and the Computer 37
2013
Search
Fix author
Co-authors
- Mercedes García-Martínez 9
- Amando Estela 8
- Laurent Bié 7
- Alexandre Helle 5
- Maite Melero 3
- Tony O’Dowd 3
- Mārcis Pinnis 3
- Artūrs Vasiļevskis 3
- Victoria Arranz 2
- Francisco Casacuberta 2
- Aleix Cerdà-i-Cucó 2
- Khalid Choukri 2
- Montse Cuadros 2
- Hans Degroote 2
- Miguel Domingo 2
- Aitor García-Pablos 2
- Alex Helle 2
- Alejandro Kohan 2
- Sinéad O’Gorman 2
- Patrick Paroubek 2
- Álvaro Peris 2
- Roberts Rozis 2
- Lucia Specia 2
- Riccardo Superbo 2
- Pierre Zweigenbaum 2
- Ēriks Ajausks 1
- Iván Lena Almor 1
- Martín Barroso Ordóñez 1
- Francisco Casacuberta Nolla 1
- Sheila Castilho 1
- Alessandro Cattelan 1
- Aleix Cerdà 1
- Gloria Corpas Pastor 1
- Amando Estela Pastor 1
- Thierry Etchegoyhen 1
- Ángela Franco 1
- Federico Gaspari 1
- Lucie Gianola 1
- Cyril Grouin 1
- Antoine Isaac 1
- Juan José Arevalillo 1
- Eirini Kaldeli 1
- Rihards Krišlauks 1
- Carmen Grau Lacal 1
- Qun Liu 1
- Marek Mazur 1
- Ruslan Mitkov 1
- Constantin Orasan 1
- Gema Ramírez-Sánchez 1
- Natália Resende 1
- Michael Rosner 1
- Paolo Sebastiano Scalia 1
- Khalil Sima’an 1
- Arne Stabenau 1
- Artūrs Stafanovič 1
- Víctor Sánchez-Cartagena 1
- Nikita Teslenko Grygoryev 1
- Andy Way 1
- Elia Yuste 1
- Josef van Genabith 1
- Valters Šics 1