Javier Iranzo-Sánchez


2024

pdf bib
Segmentation-Free Streaming Machine Translation
Javier Iranzo-Sánchez | Jorge Iranzo-Sánchez | Adrià Giménez | Jorge Civera | Alfons Juan
Transactions of the Association for Computational Linguistics, Volume 12

Streaming Machine Translation (MT) is the task of translating an unbounded input text stream in real-time. The traditional cascade approach, which combines an Automatic Speech Recognition (ASR) and an MT system, relies on an intermediate segmentation step which splits the transcription stream into sentence-like units. However, the incorporation of a hard segmentation constrains the MT system and is a source of errors. This paper proposes a Segmentation-Free framework that enables the model to translate an unsegmented source stream by delaying the segmentation decision until after the translation has been generated. Extensive experiments show how the proposed Segmentation-Free framework has better quality-latency trade-off than competing approaches that use an independent segmentation model.1

pdf bib
Streaming Neural Speech Translation
Javier Iranzo-Sánchez
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

EAMT 2023 Thesis Award submission for Javier Iranzo-Sánchez.

2023

pdf bib
Speech Translation with Style: AppTek’s Submissions to the IWSLT Subtitling and Formality Tracks in 2023
Parnia Bahar | Patrick Wilken | Javier Iranzo-Sánchez | Mattia Di Gangi | Evgeny Matusov | Zoltán Tüske
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

AppTek participated in the subtitling and formality tracks of the IWSLT 2023 evaluation. This paper describes the details of our subtitling pipeline - speech segmentation, speech recognition, punctuation prediction and inverse text normalization, text machine translation and direct speech-to-text translation, intelligent line segmentation - and how we make use of the provided subtitling-specific data in training and fine-tuning. The evaluation results show that our final submissions are competitive, in particular outperforming the submissions by other participants by 5% absolute as measured by the SubER subtitle quality metric. For the formality track, we participate with our En-Ru and En-Pt production models, which support formality control via prefix tokens. Except for informal Portuguese, we achieve near perfect formality level accuracy while at the same time offering high general translation quality.

pdf bib
VivesDebate-Speech: A Corpus of Spoken Argumentation to Leverage Audio Features for Argument Mining
Ramon Ruiz-Dolz | Javier Iranzo-Sánchez
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

In this paper, we describe VivesDebate-Speech, a corpus of spoken argumentation created to leverage audio features for argument mining tasks. The creation of this corpus represents an important contribution to the intersection of speech processing and argument mining communities, and one of the most complete publicly available resources in this topic. Moreover, we have performed a set of first-of-their-kind experiments which show an improvement when integrating audio features into the argument mining pipeline. The provided results can be used as a baseline for future research.

2022

pdf bib
From Simultaneous to Streaming Machine Translation by Leveraging Streaming History
Javier Iranzo-Sánchez | Jorge Civera | Alfons Juan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Simultaneous Machine Translation is the task of incrementally translating an input sentence before it is fully available. Currently, simultaneous translation is carried out by translating each sentence independently of the previously translated text. More generally, Streaming MT can be understood as an extension of Simultaneous MT to the incremental translation of a continuous input text stream. In this work, a state-of-the-art simultaneous sentence-level MT system is extended to the streaming setup by leveraging the streaming history. Extensive empirical results are reported on IWSLT Translation Tasks, showing that leveraging the streaming history leads to significant quality gains. In particular, the proposed system proves to compare favorably to the best performing systems.

pdf bib
MLLP-VRAIN UPV systems for the IWSLT 2022 Simultaneous Speech Translation and Speech-to-Speech Translation tasks
Javier Iranzo-Sánchez | Javier Jorge Cano | Alejandro Pérez-González-de-Martos | Adrián Giménez Pastor | Gonçal Garcés Díaz-Munío | Pau Baquero-Arnal | Joan Albert Silvestre-Cerdà | Jorge Civera Saiz | Albert Sanchis | Alfons Juan
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This work describes the participation of the MLLP-VRAIN research group in the two shared tasks of the IWSLT 2022 conference: Simultaneous Speech Translation and Speech-to-Speech Translation. We present our streaming-ready ASR, MT and TTS systems for Speech Translation and Synthesis from English into German. Our submission combines these systems by means of a cascade approach paying special attention to data preparation and decoding for streaming inference.

2021

pdf bib
Stream-level Latency Evaluation for Simultaneous Machine Translation
Javier Iranzo-Sánchez | Jorge Civera Saiz | Alfons Juan
Findings of the Association for Computational Linguistics: EMNLP 2021

Simultaneous machine translation has recently gained traction thanks to significant quality improvements and the advent of streaming applications. Simultaneous translation systems need to find a trade-off between translation quality and response time, and with this purpose multiple latency measures have been proposed. However, latency evaluations for simultaneous translation are estimated at the sentence level, not taking into account the sequential nature of a streaming scenario. Indeed, these sentence-level latency measures are not well suited for continuous stream translation, resulting in figures that are not coherent with the simultaneous translation policy of the system being assessed. This work proposes a stream level adaptation of the current latency measures based on a re-segmentation approach applied to the output translation, that is successfully evaluated on streaming conditions for a reference IWSLT task.

2020

pdf bib
Direct Segmentation Models for Streaming Speech Translation
Javier Iranzo-Sánchez | Adrià Giménez Pastor | Joan Albert Silvestre-Cerdà | Pau Baquero-Arnal | Jorge Civera Saiz | Alfons Juan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. These systems are usually connected by a segmenter that splits the ASR output into hopefully, semantically self-contained chunks to be fed into the MT system. This is specially challenging in the case of streaming ST, where latency requirements must also be taken into account. This work proposes novel segmentation models for streaming ST that incorporate not only textual, but also acoustic information to decide when the ASR output is split into a chunk. An extensive and throughly experimental setup is carried out on the Europarl-ST dataset to prove the contribution of acoustic information to the performance of the segmentation model in terms of BLEU score in a streaming ST scenario. Finally, comparative results with previous work also show the superiority of the segmentation models proposed in this work.

2019

pdf bib
The MLLP-UPV Supervised Machine Translation Systems for WMT19 News Translation Task
Javier Iranzo-Sánchez | Gonçal Garcés Díaz-Munío | Jorge Civera | Alfons Juan
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 News Translation Shared Task. In this edition, we have submitted systems for the German ↔ English and German ↔ French language pairs, participating in both directions of each pair. Our submitted systems, based on the Transformer architecture, make ample use of data filtering, synthetic data and domain adaptation through fine-tuning.

pdf bib
The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task
Pau Baquero-Arnal | Javier Iranzo-Sánchez | Jorge Civera | Alfons Juan
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 Similar Language Translation Shared Task. We have submitted systems for the Portuguese ↔ Spanish language pair, in both directions. We have submitted systems based on the Transformer architecture as well as an in development novel architecture which we have called 2D alternating RNN. We have carried out domain adaptation through fine-tuning.

2018

pdf bib
The MLLP-UPV German-English Machine Translation System for WMT18
Javier Iranzo-Sánchez | Pau Baquero-Arnal | Gonçal V. Garcés Díaz-Munío | Adrià Martínez-Villaronga | Jorge Civera | Alfons Juan
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the statistical machine translation system built by the MLLP research group of Universitat Politècnica de València for the German→English news translation shared task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We used an ensemble of Transformer architecture–based neural machine translation systems. To train our system under “constrained” conditions, we filtered the provided parallel data with a scoring technique using character-based language models, and we added parallel data based on synthetic source sentences generated from the provided monolingual corpora.