Roldano Cattoni


2021

pdf bib
FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN
Antonios Anastasopoulos | Ondřej Bojar | Jacob Bremerman | Roldano Cattoni | Maha Elbayad | Marcello Federico | Xutai Ma | Satoshi Nakamura | Matteo Negri | Jan Niehues | Juan Pino | Elizabeth Salesky | Sebastian Stüker | Katsuhito Sudoh | Marco Turchi | Alexander Waibel | Changhan Wang | Matthew Wiesner
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation. A total of 22 teams participated in at least one of the tasks. This paper describes each shared task, data and evaluation metrics, and reports results of the received submissions.

2020

pdf bib
FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN
Ebrahim Ansari | Amittai Axelrod | Nguyen Bach | Ondřej Bojar | Roldano Cattoni | Fahim Dalvi | Nadir Durrani | Marcello Federico | Christian Federmann | Jiatao Gu | Fei Huang | Kevin Knight | Xutai Ma | Ajay Nagesh | Matteo Negri | Jan Niehues | Juan Pino | Elizabeth Salesky | Xing Shi | Sebastian Stüker | Marco Turchi | Alexander Waibel | Changhan Wang
Proceedings of the 17th International Conference on Spoken Language Translation

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation. A total of teams participated in at least one of the tracks. This paper introduces each track’s goal, data and evaluation metrics, and reports the results of the received submissions.

pdf bib
Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus
Luisa Bentivogli | Beatrice Savoldi | Matteo Negri | Mattia A. Di Gangi | Roldano Cattoni | Marco Turchi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained by the fact that the input sentence does not always contain clues about the gender identity of the referred human entities. But what happens with speech translation, where the input is an audio signal? Can audio provide additional information to reduce gender bias? We present the first thorough investigation of gender bias in speech translation, contributing with: i) the release of a benchmark useful for future studies, and ii) the comparison of different technologies (cascade and end-to-end) on two language directions (English-Italian/French).

2019

pdf bib
MuST-C: a Multilingual Speech Translation Corpus
Mattia A. Di Gangi | Roldano Cattoni | Luisa Bentivogli | Matteo Negri | Marco Turchi
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Current research on spoken language translation (SLT) has to confront with the scarcity of sizeable and publicly available training corpora. This problem hinders the adoption of neural end-to-end approaches, which represent the state of the art in the two parent tasks of SLT: automatic speech recognition and machine translation. To fill this gap, we created MuST-C, a multilingual speech translation corpus whose size and quality will facilitate the training of end-to-end systems for SLT from English into 8 languages. For each target language, MuST-C comprises at least 385 hours of audio recordings from English TED Talks, which are automatically aligned at the sentence level with their manual transcriptions and translations. Together with a description of the corpus creation methodology (scalable to add new data and cover new languages), we provide an empirical verification of its quality and SLT results computed with a state-of-the-art approach on each language direction.

pdf bib
Enhancing Transformer for End-to-end Speech-to-Text Translation
Mattia Antonino Di Gangi | Matteo Negri | Roldano Cattoni | Roberto Dessi | Marco Turchi
Proceedings of Machine Translation Summit XVII: Research Track

2015

pdf bib
The IWSLT 2015 Evaluation Campaign
Mauro Cettolo | Jan Niehues | Sebastian Stüker | Luisa Bentivogli | Roldano Cattoni | Marcello Federico
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

2012

pdf bib
The KnowledgeStore: an Entity-Based Storage System
Roldano Cattoni | Francesco Corcoglioniti | Christian Girardi | Bernardo Magnini | Luciano Serafini | Roberto Zanoli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the KnowledgeStore, a large-scale infrastructure for the combined storage and interlinking of multimedia resources and ontological knowledge. Information in the KnowledgeStore is organized around entities, such as persons, organizations and locations. The system allows (i) to import background knowledge about entities, in form of annotated RDF triples; (ii) to associate resources to entities by automatically recognizing, coreferring and linking mentions of named entities; and (iii) to derive new entities based on knowledge extracted from mentions. The KnowledgeStore builds on state of art technologies for language processing, including document tagging, named entity extraction and cross-document coreference. Its design provides for a tight integration of linguistic and semantic features, and eases the further processing of information by explicitly representing the contexts where knowledge and mentions are valid or relevant. We describe the system and report about the creation of a large-scale KnowledgeStore instance for storing and integrating multimedia contents and background knowledge relevant to the Italian Trentino region.

2008

pdf bib
FBK @ IWSLT-2008.
Nicola Bertoldi | Roldano Cattoni | Marcello Federico | Madalina Barbaiani
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper reports on the participation of FBK at the IWSLT 2008 Evaluation. Main effort has been spent on the Chinese-Spanish Pivot task. We implemented four methods to perform pivot translation. The results on the IWSLT 2008 test data show that our original method for generating training data through random sampling outperforms the best methods based on coupling translation systems. FBK also participated in the Chinese-English Challenge task and the Chinese-English and Chinese-Spanish BTEC tasks, employing the standard state-of-the-art MT system Moses Toolkit.

pdf bib
Phrase-based statistical machine translation with pivot languages.
Nicola Bertoldi | Madalina Barbaiani | Marcello Federico | Roldano Cattoni
Proceedings of the 5th International Workshop on Spoken Language Translation: Papers

Translation with pivot languages has recently gained attention as a means to circumvent the data bottleneck of statistical machine translation (SMT). This paper tries to give a mathematically sound formulation of the various approaches presented in the literature and introduces new methods for training alignment models through pivot languages. We present experimental results on Chinese-Spanish translation via English, on a popular traveling domain task. In contrast to previous literature, we report experimental results by using parallel corpora that are either disjoint or overlapped on the pivot language side. Finally, our original method for generating training data through random sampling shows to perform as well as the best methods based on the coupling of translation systems.

2007

pdf bib
FBK@IWSLT 2007
Nicola Bertoldi | Mauro Cettolo | Roldano Cattoni | Marcello Federico
Proceedings of the Fourth International Workshop on Spoken Language Translation

This paper reports on the participation of FBK (formerly ITC-irst) at the IWSLT 2007 Evaluation. FBK participated in three tasks, namely Chinese-to-English, Japanese-to-English, and Italian-to-English. With respect to last year, translation systems were developed with the Moses Toolkit and the IRSTLM library, both available as open source software. Moreover, several novel ideas were investigated: the use of confusion networks in input to manage ambiguity in punctuation, the estimation of an additional language model by means of the Google’s Web 1T 5-gram collection, the combination of true case and lower case language models, and finally the use of multiple phrase-tables. By working on top of a state-of-the art baseline, experiments showed that the above methods accounted for significant BLEU score improvements.

2006

pdf bib
The ITC-irst SMT system for IWSLT 2006
Boxing Chen | Roldano Cattoni | Nicola Bertoldi | Mauro Cettolo | Marcello Federico
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
A Web-based Demonstrator of a Multi-lingual Phrase-based Translation System
Roldano Cattoni | Nicola Bertoldi | Mauro Cettolo | Boxing Chen | Marcello Federico
Demonstrations

2005

pdf bib
The ITC-irst SMT System for IWSLT-2005
Boxing Chen | Roldano Cattoni | Nicola Bertoldi | Mauro Cettolo | Marcello Federico
Proceedings of the Second International Workshop on Spoken Language Translation

2004

pdf bib
The Italian NESPOLE! Corpus: a Multilingual Database with Interlingua Annotation in Tourism and Medical Domains
Nadia Mana | Roldano Cattoni | Emanuele Pianta | Franca Rossi | Fabio Pianesi | Susanne Burger
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
The ITC-irst statistical machine translation system for IWSLT-
Nicola Bertoldi | Roldano Cattoni | Mauro Cettolo | Marcello Federico
Proceedings of the First International Workshop on Spoken Language Translation: Evaluation Campaign

2002

pdf bib
ADAM: The SI-TAL Corpus of Annotated Dialogues
Roldano Cattoni | Morena Danieli | Vanessa Sandrini | Claudia Soria
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Balancing Expressiveness and Simplicity in an Interlingua for Task Based Dialogue
Lori Levin | Donna Gates | Dorcas Pianta | Roldano Cattoni | Nadia Mana | Kay Peterson | Alon Lavie | Fabio Pianesi
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf bib
A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System
Alon Lavie | Florian Metze | Roldano Cattoni | Erica Costantini
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

2000

pdf bib
ADAM- An Architecture for xml-based Dialogue Annotation on Multiple levels
Claudia Soria | Roldano Cattoni | Morena Danieli
1st SIGdial Workshop on Discourse and Dialogue