2024
pdf
bib
abs
Segmentation-Free Streaming Machine Translation
Javier Iranzo-Sánchez
|
Jorge Iranzo-Sánchez
|
Adrià Giménez
|
Jorge Civera
|
Alfons Juan
Transactions of the Association for Computational Linguistics, Volume 12
Streaming Machine Translation (MT) is the task of translating an unbounded input text stream in real-time. The traditional cascade approach, which combines an Automatic Speech Recognition (ASR) and an MT system, relies on an intermediate segmentation step which splits the transcription stream into sentence-like units. However, the incorporation of a hard segmentation constrains the MT system and is a source of errors. This paper proposes a Segmentation-Free framework that enables the model to translate an unsegmented source stream by delaying the segmentation decision until after the translation has been generated. Extensive experiments show how the proposed Segmentation-Free framework has better quality-latency trade-off than competing approaches that use an independent segmentation model.1
2022
pdf
bib
abs
From Simultaneous to Streaming Machine Translation by Leveraging Streaming History
Javier Iranzo-Sánchez
|
Jorge Civera
|
Alfons Juan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Simultaneous Machine Translation is the task of incrementally translating an input sentence before it is fully available. Currently, simultaneous translation is carried out by translating each sentence independently of the previously translated text. More generally, Streaming MT can be understood as an extension of Simultaneous MT to the incremental translation of a continuous input text stream. In this work, a state-of-the-art simultaneous sentence-level MT system is extended to the streaming setup by leveraging the streaming history. Extensive empirical results are reported on IWSLT Translation Tasks, showing that leveraging the streaming history leads to significant quality gains. In particular, the proposed system proves to compare favorably to the best performing systems.
2019
pdf
bib
abs
The MLLP-UPV Supervised Machine Translation Systems for WMT19 News Translation Task
Javier Iranzo-Sánchez
|
Gonçal Garcés Díaz-Munío
|
Jorge Civera
|
Alfons Juan
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 News Translation Shared Task. In this edition, we have submitted systems for the German ↔ English and German ↔ French language pairs, participating in both directions of each pair. Our submitted systems, based on the Transformer architecture, make ample use of data filtering, synthetic data and domain adaptation through fine-tuning.
pdf
bib
abs
The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task
Pau Baquero-Arnal
|
Javier Iranzo-Sánchez
|
Jorge Civera
|
Alfons Juan
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 Similar Language Translation Shared Task. We have submitted systems for the Portuguese ↔ Spanish language pair, in both directions. We have submitted systems based on the Transformer architecture as well as an in development novel architecture which we have called 2D alternating RNN. We have carried out domain adaptation through fine-tuning.
2018
pdf
bib
abs
The MLLP-UPV German-English Machine Translation System for WMT18
Javier Iranzo-Sánchez
|
Pau Baquero-Arnal
|
Gonçal V. Garcés Díaz-Munío
|
Adrià Martínez-Villaronga
|
Jorge Civera
|
Alfons Juan
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
This paper describes the statistical machine translation system built by the MLLP research group of Universitat Politècnica de València for the German→English news translation shared task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We used an ensemble of Transformer architecture–based neural machine translation systems. To train our system under “constrained” conditions, we filtered the provided parallel data with a scoring technique using character-based language models, and we added parallel data based on synthetic source sentences generated from the provided monolingual corpora.
2010
pdf
bib
abs
Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT
Jesús González-Rubio
|
Jorge Civera
|
Alfons Juan
|
Francisco Casacuberta
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Currently, a great effort is being carried out in the digitalisation of large historical document collections for preservation purposes. The documents in these collections are usually written in ancient languages, such as Latin or Greek, which limits the access of the general public to their content due to the language barrier. Therefore, digital libraries aim not only at storing raw images of digitalised documents, but also to annotate them with their corresponding text transcriptions and translations into modern languages. Unfortunately, ancient languages have at their disposal scarce electronic resources to be exploited by natural language processing techniques. This paper describes the compilation process of a novel Latin-Catalan parallel corpus as a new task for statistical machine translation (SMT). Preliminary experimental results are also reported using a state-of-the-art phrase-based SMT system. The results presented in this work reveal the complexity of the task and its challenging, but interesting nature for future development.
2009
pdf
bib
Statistical Approaches to Computer-Assisted Translation
Sergio Barrachina
|
Oliver Bender
|
Francisco Casacuberta
|
Jorge Civera
|
Elsa Cubel
|
Shahram Khadivi
|
Antonio Lagarda
|
Hermann Ney
|
Jesús Tomás
|
Enrique Vidal
|
Juan-Miguel Vilar
Computational Linguistics, Volume 35, Number 1, March 2009
2008
pdf
bib
abs
Bilingual Text Classification using the IBM 1 Translation Model
Jorge Civera
|
Alfons Juan-Císcar
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Manual categorisation of documents is a time-consuming task that has been significantly alleviated with the deployment of automatic and machine-aided text categorisation systems. However, the proliferation of multilingual documentation has become a common phenomenon in many international organisations, while most of the current systems have focused on the categorisation of monolingual text. It has been recently shown that the inherent redundancy in bilingual documents can be effectively exploited by relatively simple, bilingual naive Bayes (multinomial) models. In this work, we present a refined version of these models in which this redundancy is explicitly captured by a combination of a unigram (multinomial) model and the well-known IBM 1 translation model. The proposed model is evaluated on two bilingual classification tasks and compared to previous work.
pdf
bib
Improving Interactive Machine Translation via Mouse Actions
Germán Sanchis-Trilles
|
Daniel Ortiz-Martínez
|
Jorge Civera
|
Francisco Casacuberta
|
Enrique Vidal
|
Hieu Hoang
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
2007
pdf
bib
Domain Adaptation in Statistical Machine Translation with Mixture Modelling
Jorge Civera
|
Alfons Juan
Proceedings of the Second Workshop on Statistical Machine Translation
2006
pdf
bib
abs
Bilingual Machine-Aided Indexing
Jorge Civera
|
Alfons Juan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The proliferation of multilingual documentation in our Information Society has become a common phenomenon. This documentation is usually categorised by hand, entailing a time-consuming and arduous burden. This is particularly true in the case of keyword assignment, in which a list of keywords (descriptors) from a controlled vocabulary (thesaurus) is assigned to a document. A possible solution to alleviate this problem comes from the hand of the so-called Machine-Aided Indexing (MAI) systems. These systems work in cooperation with professional indexer by providing a initial list of descriptors from which those most appropiated will be selected. This way of proceeding increases the productivity and eases the task of indexers. In this paper, we propose a statistical text classification framework for bilingual documentation, from which we derive two novel bilingual classifiers based on the naive combination of monolingual classifiers. We report preliminary results on the multilingual corpus Acquis Communautaire (AC) that demonstrates the suitability of the proposed classifiers as the backend of a fully-working MAI system.
pdf
bib
A Computer-Assisted Translation Tool based on Finite-State Technology
Jorge Civera
|
Antonio L. Lagarda
|
Elsa Cubel
|
Francisco Casacuberta
|
Enrique Vidal
|
Juan M. Vilar
|
Sergio Barrachina
Proceedings of the 11th Annual Conference of the European Association for Machine Translation
pdf
bib
Mixtures of IBM Model 2
Jorge Civera
|
Alfons Juan
Proceedings of the 11th Annual Conference of the European Association for Machine Translation
2004
pdf
bib
From Machine Translation to Computer Assisted Translation using Finite-State Models
Jorge Civera
|
Elsa Cubel
|
Antonio L. Lagarda
|
David Picó
|
Jorge González
|
Enrique Vidal
|
Francisco Casacuberta
|
Juan M. Vilar
|
Sergio Barrachina
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing