Alfons Juan

Also published as: Alfons Juan-Císcar

2025

MLLP-VRAIN UPV system for the IWSLT 2025 Simultaneous Speech Translation Translation task
Jorge Iranzo-Sánchez | Javier Iranzo-Sanchez | Adrià Giménez Pastor | Jorge Civera Saiz | Alfons Juan
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)

This work describes the participation of the MLLP-VRAIN research group in the shared task of the IWSLT 2025 Simultaneous Speech Translation track. Our submission addresses the unique challenges of real-time translation of long-form speech by developing a modular cascade system that adapts strong pre-trained models to streaming scenarios. We combine Whisper Large-V3-Turbo for ASR with the multilingual NLLB-3.3B model for MT, implementing lightweight adaptation techniques rather than training new end-to-end models from scratch. Our approach employs document-level adaptation with prefix training to enhance the MT model’s ability to handle incomplete inputs, while incorporating adaptive emission policies including a wait-k strategy and RALCP for managing the translation stream. Specialized buffer management techniques and segmentation strategies ensure coherent translations across long audio sequences. Experimental results on the ACL60/60 dataset demonstrate that our system achieves a favorable balance between translation quality and latency, with a BLEU score of 31.96 and non-computational-aware StreamLAAL latency of 2.94 seconds. Our final model achieves a preliminary score on the official test set (IWSLT25Instruct) of 29.8 BLEU. Our work demonstrates that carefully adapted pre-trained components can create effective simultaneous translation systems for long-form content without requiring extensive in-domain parallel data or specialized end-to-end training.

2024

pdf bib abs

Segmentation-Free Streaming Machine Translation
Javier Iranzo-Sánchez | Jorge Iranzo-Sánchez | Adrià Giménez | Jorge Civera | Alfons Juan
Transactions of the Association for Computational Linguistics, Volume 12

Streaming Machine Translation (MT) is the task of translating an unbounded input text stream in real-time. The traditional cascade approach, which combines an Automatic Speech Recognition (ASR) and an MT system, relies on an intermediate segmentation step which splits the transcription stream into sentence-like units. However, the incorporation of a hard segmentation constrains the MT system and is a source of errors. This paper proposes a Segmentation-Free framework that enables the model to translate an unsegmented source stream by delaying the segmentation decision until after the translation has been generated. Extensive experiments show how the proposed Segmentation-Free framework has better quality-latency trade-off than competing approaches that use an independent segmentation model.1

2022

pdf bib abs

From Simultaneous to Streaming Machine Translation by Leveraging Streaming History
Javier Iranzo-Sánchez | Jorge Civera | Alfons Juan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Simultaneous Machine Translation is the task of incrementally translating an input sentence before it is fully available. Currently, simultaneous translation is carried out by translating each sentence independently of the previously translated text. More generally, Streaming MT can be understood as an extension of Simultaneous MT to the incremental translation of a continuous input text stream. In this work, a state-of-the-art simultaneous sentence-level MT system is extended to the streaming setup by leveraging the streaming history. Extensive empirical results are reported on IWSLT Translation Tasks, showing that leveraging the streaming history leads to significant quality gains. In particular, the proposed system proves to compare favorably to the best performing systems.

pdf bib abs

MLLP-VRAIN UPV systems for the IWSLT 2022 Simultaneous Speech Translation and Speech-to-Speech Translation tasks
Javier Iranzo-Sánchez | Javier Jorge Cano | Alejandro Pérez-González-de-Martos | Adrián Giménez Pastor | Gonçal V. Garcés Díaz-Munío | Pau Baquero-Arnal | Joan Albert Silvestre-Cerdà | Jorge Civera Saiz | Albert Sanchis | Alfons Juan
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This work describes the participation of the MLLP-VRAIN research group in the two shared tasks of the IWSLT 2022 conference: Simultaneous Speech Translation and Speech-to-Speech Translation. We present our streaming-ready ASR, MT and TTS systems for Speech Translation and Synthesis from English into German. Our submission combines these systems by means of a cascade approach paying special attention to data preparation and decoding for streaming inference.

2021

pdf bib abs

Stream-level Latency Evaluation for Simultaneous Machine Translation
Javier Iranzo-Sánchez | Jorge Civera Saiz | Alfons Juan
Findings of the Association for Computational Linguistics: EMNLP 2021

Simultaneous machine translation has recently gained traction thanks to significant quality improvements and the advent of streaming applications. Simultaneous translation systems need to find a trade-off between translation quality and response time, and with this purpose multiple latency measures have been proposed. However, latency evaluations for simultaneous translation are estimated at the sentence level, not taking into account the sequential nature of a streaming scenario. Indeed, these sentence-level latency measures are not well suited for continuous stream translation, resulting in figures that are not coherent with the simultaneous translation policy of the system being assessed. This work proposes a stream level adaptation of the current latency measures based on a re-segmentation approach applied to the output translation, that is successfully evaluated on streaming conditions for a reference IWSLT task.

2020

pdf bib abs

Direct Segmentation Models for Streaming Speech Translation
Javier Iranzo-Sánchez | Adrià Giménez Pastor | Joan Albert Silvestre-Cerdà | Pau Baquero-Arnal | Jorge Civera Saiz | Alfons Juan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. These systems are usually connected by a segmenter that splits the ASR output into hopefully, semantically self-contained chunks to be fed into the MT system. This is specially challenging in the case of streaming ST, where latency requirements must also be taken into account. This work proposes novel segmentation models for streaming ST that incorporate not only textual, but also acoustic information to decide when the ASR output is split into a chunk. An extensive and throughly experimental setup is carried out on the Europarl-ST dataset to prove the contribution of acoustic information to the performance of the segmentation model in terms of BLEU score in a streaming ST scenario. Finally, comparative results with previous work also show the superiority of the segmentation models proposed in this work.

2019

pdf bib abs

The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task
Pau Baquero-Arnal | Javier Iranzo-Sánchez | Jorge Civera | Alfons Juan
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 Similar Language Translation Shared Task. We have submitted systems for the Portuguese ↔ Spanish language pair, in both directions. We have submitted systems based on the Transformer architecture as well as an in development novel architecture which we have called 2D alternating RNN. We have carried out domain adaptation through fine-tuning.

pdf bib abs

The MLLP-UPV Supervised Machine Translation Systems for WMT19 News Translation Task
Javier Iranzo-Sánchez | Gonçal V. Garcés Díaz-Munío | Jorge Civera | Alfons Juan
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 News Translation Shared Task. In this edition, we have submitted systems for the German ↔ English and German ↔ French language pairs, participating in both directions of each pair. Our submitted systems, based on the Transformer architecture, make ample use of data filtering, synthetic data and domain adaptation through fine-tuning.

2018

pdf bib abs

The MLLP-UPV German-English Machine Translation System for WMT18
Javier Iranzo-Sánchez | Pau Baquero-Arnal | Gonçal V. Garcés Díaz-Munío | Adrià Martínez-Villaronga | Jorge Civera | Alfons Juan
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the statistical machine translation system built by the MLLP research group of Universitat Politècnica de València for the German→English news translation shared task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We used an ensemble of Transformer architecture–based neural machine translation systems. To train our system under “constrained” conditions, we filtered the provided parallel data with a scoring technique using character-based language models, and we added parallel data based on synthetic source sentences generated from the provided monolingual corpora.

2015

pdf bib

The MLLP ASR systems for IWSLT 2015
Miguel Ángel Del Agua Teba | Adrià Agusti Martinez Villaronga | Santiago Piqueras Gozalbes | Adrià Giménez Pastor | José Alberto Sanchis Navarro | Jorge Civera Saiz | Alfons Juan-Císcar
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

2014

pdf bib abs

For the task of online translation of scientific video lectures, using huge models is not possible. In order to get smaller and efficient models, we perform data selection. In this paper, we perform a qualitative and quantitative comparison of several data selection techniques, based on cross-entropy and infrequent n-gram criteria. In terms of BLEU, a combination of translation and language model cross-entropy achieves the most stable results. As another important criterion for measuring translation quality in our application, we identify the number of out-of-vocabulary words. Here, infrequent n-gram recovery shows superior performance. Finally, we combine the two selection techniques in order to benefit from both their strengths.

2011

pdf bib

Minimum Bayes-risk System Combination
Jesús González-Rubio | Alfons Juan | Francisco Casacuberta
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib abs

Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT
Jesús González-Rubio | Jorge Civera | Alfons Juan | Francisco Casacuberta
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Currently, a great effort is being carried out in the digitalisation of large historical document collections for preservation purposes. The documents in these collections are usually written in ancient languages, such as Latin or Greek, which limits the access of the general public to their content due to the language barrier. Therefore, digital libraries aim not only at storing raw images of digitalised documents, but also to annotate them with their corresponding text transcriptions and translations into modern languages. Unfortunately, ancient languages have at their disposal scarce electronic resources to be exploited by natural language processing techniques. This paper describes the compilation process of a novel Latin-Catalan parallel corpus as a new task for statistical machine translation (SMT). Preliminary experimental results are also reported using a state-of-the-art phrase-based SMT system. The results presented in this work reveal the complexity of the task and its challenging, but interesting nature for future development.

pdf bib abs

The RODRIGO Database
Nicolas Serrano | Francisco Castro | Alfons Juan
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Annotation of digitized pages from historical document collections is very important to research on automatic extraction of text blocks, lines, and handwriting recognition. We have recently introduced a new handwritten text database, GERMANA, which is based on a Spanish manuscript from 1891. To our knowledge, GERMANA is the first publicly available database mostly written in Spanish and comparable in size to standard databases. In this paper, we present another handwritten text database, RODRIGO, completely written in Spanish and comparable in size to GERMANA. However, RODRIGO comes from a much older manuscript, from 1545, where the typical difficult characteristics of historical documents are more evident. In particular, the writing style, which has clear Gothic influences, is significantly more complex than that of GERMANA. We also provide baseline results of handwriting recognition for reference in future studies, using standard techniques and tools for preprocessing, feature extraction, HMM-based image modelling, and language modelling.

2009

pdf bib

A Phrase-Based Hidden Semi-Markov Approach to Machine Translation
Jesús Andrés-Ferrer | Alfons Juan
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

2008

pdf bib

A novel alignment model inspired on IBM Model 1
Jesús González-Rubio | Germán Sanchis-Trilles | Alfons Juan | Francisco Casacuberta
Proceedings of the 12th Annual Conference of the European Association for Machine Translation

pdf bib abs

Bilingual Text Classification using the IBM 1 Translation Model
Jorge Civera | Alfons Juan-Císcar
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Manual categorisation of documents is a time-consuming task that has been significantly alleviated with the deployment of automatic and machine-aided text categorisation systems. However, the proliferation of multilingual documentation has become a common phenomenon in many international organisations, while most of the current systems have focused on the categorisation of monolingual text. It has been recently shown that the inherent redundancy in bilingual documents can be effectively exploited by relatively simple, bilingual naive Bayes (multinomial) models. In this work, we present a refined version of these models in which this redundancy is explicitly captured by a combination of a unigram (multinomial) model and the well-known IBM 1 translation model. The proposed model is evaluated on two bilingual classification tasks and compared to previous work.

2007

pdf bib

Estimation of confidence measures for machine translation
Alberto Sanchis | Alfons Juan | Enrique Vidal
Proceedings of Machine Translation Summit XI: Papers

pdf bib

Domain Adaptation in Statistical Machine Translation with Mixture Modelling
Jorge Civera | Alfons Juan
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf bib abs

Bilingual Machine-Aided Indexing
Jorge Civera | Alfons Juan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The proliferation of multilingual documentation in our Information Society has become a common phenomenon. This documentation is usually categorised by hand, entailing a time-consuming and arduous burden. This is particularly true in the case of keyword assignment, in which a list of keywords (descriptors) from a controlled vocabulary (thesaurus) is assigned to a document. A possible solution to alleviate this problem comes from the hand of the so-called Machine-Aided Indexing (MAI) systems. These systems work in cooperation with professional indexer by providing a initial list of descriptors from which those most appropiated will be selected. This way of proceeding increases the productivity and eases the task of indexers. In this paper, we propose a statistical text classification framework for bilingual documentation, from which we derive two novel bilingual classifiers based on the naive combination of monolingual classifiers. We report preliminary results on the multilingual corpus Acquis Communautaire (AC) that demonstrates the suitability of the proposed classifiers as the backend of a fully-working MAI system.

pdf bib

Mixtures of IBM Model 2
Jorge Civera | Alfons Juan
Proceedings of the 11th Annual Conference of the European Association for Machine Translation