Gema Ramírez‐Sánchez

Also published as: Gema Ramirez-Sánchez, Gema Ramírez, Gema Ramírez-Sànchez, Gema Ramírez-Sánchez


2020

pdf bib
ParaCrawl: Web-Scale Acquisition of Parallel Corpora
Marta Bañón | Pinzhen Chen | Barry Haddow | Kenneth Heafield | Hieu Hoang | Miquel Esplà-Gomis | Mikel L. Forcada | Amir Kamran | Faheem Kirefu | Philipp Koehn | Sergio Ortiz Rojas | Leopoldo Pla Sempere | Gema Ramírez-Sánchez | Elsa Sarrías | Marek Strelec | Brian Thompson | William Waites | Dion Wiggins | Jaume Zaragoza
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software. We empirically compare alternative methods and publish benchmark data sets for sentence alignment and sentence pair filtering. We also describe the parallel corpora released and evaluate their quality and their usefulness to create machine translation systems.

pdf bib
Bifixer and Bicleaner: two open-source tools to clean your parallel data
Gema Ramírez-Sánchez | Jaume Zaragoza-Bernabeu | Marta Bañón | Sergio Ortiz Rojas
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper shows the utility of two open-source tools designed for parallel data cleaning: Bifixer and Bicleaner. Already used to clean highly noisy parallel content from crawled multilingual websites, we evaluate their performance in a different scenario: cleaning publicly available corpora commonly used to train machine translation systems. We choose four English–Portuguese corpora which we plan to use internally to compute paraphrases at a later stage. We clean the four corpora using both tools, which are described in detail, and analyse the effect of some of the cleaning steps on them. We then compare machine translation training times and quality before and after cleaning these corpora, showing a positive impact particularly for the noisiest ones.

2019

pdf bib
ParaCrawl: Web-scale parallel corpora for the languages of the EU
Miquel Esplà | Mikel Forcada | Gema Ramírez-Sánchez | Hieu Hoang
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf bib
Large-scale Machine Translation Evaluation of the iADAATPA Project
Sheila Castilho | Natália Resende | Federico Gaspari | Andy Way | Tony O’Dowd | Marek Mazur | Manuel Herranz | Alex Helle | Gema Ramírez-Sánchez | Víctor Sánchez-Cartagena | Mārcis Pinnis | Valters Šics
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

2018

pdf bib
Same-language machine translation for local flavours/flavors
Gema Ramírez-Sánchez | Janice Campbell
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

pdf bib
Prompsit’s submission to WMT 2018 Parallel Corpus Filtering shared task
Víctor M. Sánchez-Cartagena | Marta Bañón | Sergio Ortiz-Rojas | Gema Ramírez
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes Prompsit Language Engineering’s submissions to the WMT 2018 parallel corpus filtering shared task. Our four submissions were based on an automatic classifier for identifying pairs of sentences that are mutual translations. A set of hand-crafted hard rules for discarding sentences with evident flaws were applied before the classifier. We explored different strategies for achieving a training corpus with diverse vocabulary and fluent sentences: language model scoring, an active-learning-inspired data selection algorithm and n-gram saturation. Our submissions were very competitive in comparison with other participants on the 100 million word training corpus.

2016

pdf bib
AltLang: an automatic converter between varieties of English, Spanish, French and Portuguese
Gema Ramírez-Sànchez
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

pdf bib
Collaborative Development of a Rule-Based Machine Translator between Croatian and Serbian
Filip Klubička | Gema Ramírez-Sánchez | Nikola Ljubešić
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
Re-assessing the Impact of SMT Techniques with Human Evaluation: a Case Study on English—Croatian
Antonio Toral | Raphael Rubino | Gema Ramírez-Sánchez
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

2015

pdf bib
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
İlknur Durgar El-Kahlout | Mehmed Özkan | Felipe Sánchez-Martínez | Gema Ramírez-Sánchez | Fred Hollowood | Andy Way
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Abu-MaTran: Automatic building of Machine Translation
Antonio Toral | Tommi A. Pirinen | Andy Way | Gema Ramírez-Sánchez | Sergio Ortiz Rojas | Raphael Rubino | Miquel Esplà | Mikel L. Forcada | Vassilis Papavassiliou | Prokopis Prokopidis | Nikola Ljubešić
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
İIknur El‐Kahlout | Mehmed Özkan | Felipe Sánchez‐Martínez | Gema Ramírez‐Sánchez | Fred Hollywood | Andy Way
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Abu-MaTran: Automatic building of Machine Translation
Antonio Toral | Tommi A Pirinen | Andy Way | Gema Ramírez-Sánchez | Sergio Ortiz Rojas | Raphael Rubino | Miquel Esplà | Mikel Forcada | Vassilis Papavassiliou | Prokopis Prokopidis | Nikola Ljubešić
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
Abu-MaTran at WMT 2014 Translation Task: Two-step Data Selection and RBMT-Style Synthetic Rules
Raphael Rubino | Antonio Toral | Victor M. Sánchez-Cartagena | Jorge Ferrández-Tordera | Sergio Ortiz-Rojas | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez | Andy Way
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Quality Estimation for Synthetic Parallel Data Generation
Raphael Rubino | Antonio Toral | Nikola Ljubešić | Gema Ramírez-Sánchez
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a novel approach for parallel data generation using machine translation and quality estimation. Our study focuses on pivot-based machine translation from English to Croatian through Slovene. We generate an English―Croatian version of the Europarl parallel corpus based on the English―Slovene Europarl corpus and the Apertium rule-based translation system for Slovene―Croatian. These experiments are to be considered as a first step towards the generation of reliable synthetic parallel data for under-resourced languages. We first collect small amounts of aligned parallel data for the Slovene―Croatian language pair in order to build a quality estimation system for sentence-level Translation Edit Rate (TER) estimation. We then infer TER scores on automatically translated Slovene to Croatian sentences and use the best translations to build an English―Croatian statistical MT system. We show significant improvement in terms of automatic metrics obtained on two test sets using our approach compared to a random selection of synthetic parallel data.

pdf bib
Extrinsic evaluation of web-crawlers in machine translation: a study on Croatian-English for the tourism domain
Antonio Toral | Raphael Rubino | Miquel Esplà-Gomis | Tommi Pirinen | Andy Way | Gema Ramírez-Sánchez
Proceedings of the 17th Annual conference of the European Association for Machine Translation

2012

pdf bib
Opinum: statistical sentiment analysis for opinion classification
Boyan Bonev | Gema Ramírez-Sánchez | Sergio Ortiz Rojas
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

2010

pdf bib
Using the Apertium Spanish-Brazilian Portuguese machine translation system for localization
François Masselot | Petra Ribiczey | Gema Ramírez-Sánchez
Proceedings of the 14th Annual conference of the European Association for Machine Translation

2009

pdf bib
The Apertium machine translation platform: Five years on
Mikel L. Forcada | Francis M. Tyers | Gema Ramírez-Sánchez
Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation

2006

pdf bib
Opentrad Apertium Open-Source Machine Translation System: an Opportunity for Business and Research
Gema Ramirez-Sánchez
Proceedings of Translating and the Computer 28

2005

pdf bib
An Open-Source Shallow-Transfer Machine Translation Toolbox: Consequences of Its Release and Availability
Carme Armentano-Oller | Antonio M. Corbí-Bellot | Mikel L. Forcada | Mireia Ginestí-Rosell | Boyan Bonev | Sergio Ortiz-Rojas | Juan Antonio Pérez-Ortiz | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez
Workshop on open-source machine translation

By the time Machine Translation Summit X is held in September 2005, our group will have released an open-source machine translation toolbox as part of a large government-funded project involving four universities and three linguistic technology companies from Spain. The machine translation toolbox, which will most likely be released under a GPL-like license includes (a) the open-source engine itself, a modular shallow-transfer machine translation engine suitable for related languages and largely based upon that of systems we have already developed, such as interNOSTRUM for Spanish—Catalan and Traductor Universia for Spanish—Portuguese, (b) extensive documentation (including document type declarations) specifying the XML format of all linguistic (dictionaries, rules) and document format management files, (c) compilers converting these data into the high-speed (tens of thousands of words a second) format used by the engine, and (d) pilot linguistic data for Spanish—Catalan and Spanish—Galician and format management specifications for the HTML, RTF and plain text formats. After describing very briefly this toolbox, this paper aims at exploring possible consequences of the availability of this architecture, including the community-driven development of machine translation systems for languages lacking this kind of linguistic technology.

pdf bib
An open-source shallow-transfer machine translation engine for the Romance languages of Spain
Antonio M. Corbi-Bellot | Mikel L. Forcada | Sergio Ortíz-Rojas | Juan Antonio Pérez-Ortiz | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez | Iñaki Alegria | Aingeru Mayor | Kepa Sarasola
Proceedings of the 10th EAMT Conference: Practical applications of machine translation