Alina Karakanta


2023

pdf bib
Direct Speech Translation for Automatic Subtitling
Sara Papi | Marco Gaido | Alina Karakanta | Mauro Cettolo | Matteo Negri | Marco Turchi
Transactions of the Association for Computational Linguistics, Volume 11

Automatic subtitling is the task of automatically translating the speech of audiovisual content into short pieces of timed text, i.e., subtitles and their corresponding timestamps. The generated subtitles need to conform to space and time requirements, while being synchronized with the speech and segmented in a way that facilitates comprehension. Given its considerable complexity, the task has so far been addressed through a pipeline of components that separately deal with transcribing, translating, and segmenting text into subtitles, as well as predicting timestamps. In this paper, we propose the first direct speech translation model for automatic subtitling that generates subtitles in the target language along with their timestamps with a single model. Our experiments on 7 language pairs show that our approach outperforms a cascade system in the same data condition, also being competitive with production tools on both in-domain and newly released out-domain benchmarks covering new scenarios.

2022

pdf bib
Post-editing in Automatic Subtitling: A Subtitlers’ perspective
Alina Karakanta | Luisa Bentivogli | Mauro Cettolo | Matteo Negri | Marco Turchi
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

Recent developments in machine translation and speech translation are opening up opportunities for computer-assisted translation tools with extended automation functions. Subtitling tools are recently being adapted for post-editing by providing automatically generated subtitles, and featuring not only machine translation, but also automatic segmentation and synchronisation. But what do professional subtitlers think of post-editing automatically generated subtitles? In this work, we conduct a survey to collect subtitlers’ impressions and feedback on the use of automatic subtitling in their workflows. Our findings show that, despite current limitations stemming mainly from speech processing errors, automatic subtitling is seen rather positively and has potential for the future.

pdf bib
Towards a methodology for evaluating automatic subtitling
Alina Karakanta | Luisa Bentivogli | Mauro Cettolo | Matteo Negri | Marco Turchi
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

In response to the increasing interest towards automatic subtitling, this EAMT-funded project aimed at collecting subtitle post-editing data in a real use case scenario where professional subtitlers edit automatically generated subtitles. The post-editing setting includes, for the first time, automatic generation of timestamps and segmentation, and focuses on the effect of timing and segmentation edits on the post-editing process. The collected data will serve as the basis for investigating how subtitlers interact with automatic subtitling and for devising evaluation methods geared to the multimodal nature and formal requirements of subtitling.

pdf bib
Extending the MuST-C Corpus for a Comparative Evaluation of Speech Translation Technology
Luisa Bentivogli | Mauro Cettolo | Marco Gaido | Alina Karakanta | Matteo Negri | Marco Turchi
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This project aimed at extending the test sets of the MuST-C speech translation (ST) corpus with new reference translations. The new references were collected from professional post-editors working on the output of different ST systems for three language pairs: English-German/Italian/Spanish. In this paper, we shortly describe how the data were collected and how they are distributed. As an evidence of their usefulness, we also summarise the findings of the first comparative evaluation of cascade and direct ST approaches, which was carried out relying on the collected data. The project was partially funded by the European Association for Machine Translation (EAMT) through its 2020 Sponsorship of Activities programme.

pdf bib
Dodging the Data Bottleneck: Automatic Subtitling with Automatically Segmented ST Corpora
Sara Papi | Alina Karakanta | Matteo Negri | Marco Turchi
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Speech translation for subtitling (SubST) is the task of automatically translating speech data into well-formed subtitles by inserting subtitle breaks compliant to specific displaying guidelines. Similar to speech translation (ST), model training requires parallel data comprising audio inputs paired with their textual translations. In SubST, however, the text has to be also annotated with subtitle breaks. So far, this requirement has represented a bottleneck for system development, as confirmed by the dearth of publicly available SubST corpora. To fill this gap, we propose a method to convert existing ST corpora into SubST resources without human intervention. We build a segmenter model that automatically segments texts into proper subtitles by exploiting audio and text in a multimodal fashion, achieving high segmentation quality in zero-shot conditions. Comparative experiments with SubST systems respectively trained on manual and automatic segmentations result in similar performance, showing the effectiveness of our approach.

pdf bib
Evaluating Subtitle Segmentation for End-to-end Generation Systems
Alina Karakanta | François Buet | Mauro Cettolo | François Yvon
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Subtitles appear on screen as short pieces of text, segmented based on formal constraints (length) and syntactic/semantic criteria. Subtitle segmentation can be evaluated with sequence segmentation metrics against a human reference. However, standard segmentation metrics cannot be applied when systems generate outputs different than the reference, e.g. with end-to-end subtitling systems. In this paper, we study ways to conduct reference-based evaluations of segmentation accuracy irrespective of the textual content. We first conduct a systematic analysis of existing metrics for evaluating subtitle segmentation. We then introduce Sigma, a Subtitle Segmentation Score derived from an approximate upper-bound of BLEU on segmentation boundaries, which allows us to disentangle the effect of good segmentation from text quality. To compare Sigma with existing metrics, we further propose a boundary projection method from imperfect hypotheses to the true reference. Results show that all metrics are able to reward high quality output but for similar outputs system ranking depends on each metric’s sensitivity to error type. Our thorough analyses suggest Sigma is a promising segmentation candidate but its reliability over other segmentation metrics remains to be validated through correlations with human judgements.

2021

pdf bib
Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?
Luisa Bentivogli | Mauro Cettolo | Marco Gaido | Alina Karakanta | Alberto Martinelli | Matteo Negri | Marco Turchi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Five years after the first published proofs of concept, direct approaches to speech translation (ST) are now competing with traditional cascade solutions. In light of this steady progress, can we claim that the performance gap between the two is closed? Starting from this question, we present a systematic comparison between state-of-the-art systems representative of the two paradigms. Focusing on three language directions (English-German/Italian/Spanish), we conduct automatic and manual evaluations, exploiting high-quality professional post-edits and annotations. Our multi-faceted analysis on one of the few publicly available ST benchmarks attests for the first time that: i) the gap between the two paradigms is now closed, and ii) the subtle differences observed in their behavior are not sufficient for humans neither to distinguish them nor to prefer one over the other.

pdf bib
Between Flexibility and Consistency: Joint Generation of Captions and Subtitles
Alina Karakanta | Marco Gaido | Matteo Negri | Marco Turchi
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

Speech translation (ST) has lately received growing interest for the generation of subtitles without the need for an intermediate source language transcription and timing (i.e. captions). However, the joint generation of source captions and target subtitles does not only bring potential output quality advantages when the two decoding processes inform each other, but it is also often required in multilingual scenarios. In this work, we focus on ST models which generate consistent captions-subtitles in terms of structure and lexical content. We further introduce new metrics for evaluating subtitling consistency. Our findings show that joint decoding leads to increased performance and consistency between the generated captions and subtitles while still allowing for sufficient flexibility to produce subtitles conforming to language-specific needs and norms.

pdf bib
Simultaneous Speech Translation for Live Subtitling: from Delay to Display
Alina Karakanta | Sara Papi | Matteo Negri | Marco Turchi
Proceedings of the 1st Workshop on Automatic Spoken Language Translation in Real-World Settings (ASLTRW)

With the increased audiovisualisation of communication, the need for live subtitles in multilingual events is more relevant than ever. In an attempt to automatise the process, we aim at exploring the feasibility of simultaneous speech translation (SimulST) for live subtitling. However, the word-for-word rate of generation of SimulST systems is not optimal for displaying the subtitles in a comprehensible and readable way. In this work, we adapt SimulST systems to predict subtitle breaks along with the translation. We then propose a display mode that exploits the predicted break structure by presenting the subtitles in scrolling lines. We compare our proposed mode with a display 1) word-for-word and 2) in blocks, in terms of reading speed and delay. Experiments on three language pairs (en→it, de, fr) show that scrolling lines is the only mode achieving an acceptable reading speed while keeping delay close to a 4-second threshold. We argue that simultaneous translation for readable live subtitles still faces challenges, the main one being poor translation quality, and propose directions for steering future research.

2020

pdf bib
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Alina Karakanta | Atul Kr. Ojha | Chao-Hong Liu | Jade Abbott | John Ortega | Jonathan Washington | Nathaniel Oco | Surafel Melaku Lakew | Tommi A Pirinen | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

pdf bib
Findings of the LoResMT 2020 Shared Task on Zero-Shot for Low-Resource languages
Atul Kr. Ojha | Valentin Malykh | Alina Karakanta | Chao-Hong Liu
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

This paper presents the findings of the LoResMT 2020 Shared Task on zero-shot translation for low resource languages. This task was organised as part of the 3rd Workshop on Technologies for MT of Low Resource Languages (LoResMT) at AACL-IJCNLP 2020. The focus was on the zero-shot approach as a notable development in Neural Machine Translation to build MT systems for language pairs where parallel corpora are small or even non-existent. The shared task experience suggests that back-translation and domain adaptation methods result in better accuracy for small-size datasets. We further noted that, although translation between similar languages is no cakewalk, linguistically distinct languages require more data to give better results.

pdf bib
The Two Shades of Dubbing in Neural Machine Translation
Alina Karakanta | Supratik Bhattacharya | Shravan Nayak | Timo Baumann | Matteo Negri | Marco Turchi
Proceedings of the 28th International Conference on Computational Linguistics

Dubbing has two shades; synchronisation constraints are applied only when the actor’s mouth is visible on screen, while the translation is unconstrained for off-screen dubbing. Consequently, different synchronisation requirements, and therefore translation strategies, are applied depending on the type of dubbing. In this work, we manually annotate an existing dubbing corpus (Heroes) for this dichotomy. We show that, even though we did not observe distinctive features between on- and off-screen dubbing at the textual level, on-screen dubbing is more difficult for MT (-4 BLEU points). Moreover, synchronisation constraints dramatically decrease translation quality for off-screen dubbing. We conclude that, distinguishing between on-screen and off-screen dubbing is necessary for determining successful strategies for dubbing-customised Machine Translation.

pdf bib
MuST-Cinema: a Speech-to-Subtitles corpus
Alina Karakanta | Matteo Negri | Marco Turchi
Proceedings of the Twelfth Language Resources and Evaluation Conference

Growing needs in localising audiovisual content in multiple languages through subtitles call for the development of automatic solutions for human subtitling. Neural Machine Translation (NMT) can contribute to the automatisation of subtitling, facilitating the work of human subtitlers and reducing turn-around times and related costs. NMT requires high-quality, large, task-specific training data. The existing subtitling corpora, however, are missing both alignments to the source language audio and important information about subtitle breaks. This poses a significant limitation for developing efficient automatic approaches for subtitling, since the length and form of a subtitle directly depends on the duration of the utterance. In this work, we present MuST-Cinema, a multilingual speech translation corpus built from TED subtitles. The corpus is comprised of (audio, transcription, translation) triplets. Subtitle breaks are preserved by inserting special symbols. We show that the corpus can be used to build models that efficiently segment sentences into subtitles and propose a method for annotating existing subtitling corpora with subtitle breaks, conforming to the constraint of length.

pdf bib
Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?
Alina Karakanta | Matteo Negri | Marco Turchi
Proceedings of the 17th International Conference on Spoken Language Translation

Subtitling is becoming increasingly important for disseminating information, given the enormous amounts of audiovisual content becoming available daily. Although Neural Machine Translation (NMT) can speed up the process of translating audiovisual content, large manual effort is still required for transcribing the source language, and for spotting and segmenting the text into proper subtitles. Creating proper subtitles in terms of timing and segmentation highly depends on information present in the audio (utterance duration, natural pauses). In this work, we explore two methods for applying Speech Translation (ST) to subtitling, a) a direct end-to-end and b) a classical cascade approach. We discuss the benefit of having access to the source language speech for improving the conformity of the generated subtitles to the spatial and temporal subtitling constraints and show that length is not the answer to everything in the case of subtitling-oriented ST.

2019

pdf bib
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages
Alina Karakanta | Atul Kr. Ojha | Chao-Hong Liu | Jonathan Washington | Nathaniel Oco | Surafel Melaku Lakew | Valentin Malykh | Xiaobing Zhao
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages

pdf bib
Adapting Multilingual Neural Machine Translation to Unseen Languages
Surafel M. Lakew | Alina Karakanta | Marcello Federico | Matteo Negri | Marco Turchi
Proceedings of the 16th International Conference on Spoken Language Translation

Multilingual Neural Machine Translation (MNMT) for low- resource languages (LRL) can be enhanced by the presence of related high-resource languages (HRL), but the relatedness of HRL usually relies on predefined linguistic assumptions about language similarity. Recently, adapting MNMT to a LRL has shown to greatly improve performance. In this work, we explore the problem of adapting an MNMT model to an unseen LRL using data selection and model adapta- tion. In order to improve NMT for LRL, we employ perplexity to select HRL data that are most similar to the LRL on the basis of language distance. We extensively explore data selection in popular multilingual NMT settings, namely in (zero-shot) translation, and in adaptation from a multilingual pre-trained model, for both directions (LRL↔en). We further show that dynamic adaptation of the model’s vocabulary results in a more favourable segmentation for the LRL in comparison with direct adaptation. Experiments show re- ductions in training time and significant performance gains over LRL baselines, even with zero LRL data (+13.0 BLEU), up to +17.0 BLEU for pre-trained multilingual model dynamic adaptation with related data selection. Our method outperforms current approaches, such as massively multilingual models and data augmentation, on four LRL.

2016

pdf bib
Using Related Languages to Enhance Statistical Language Models
Anna Currey | Alina Karakanta | Jon Dehdari
Proceedings of the NAACL Student Research Workshop