Constructing a Multimodal, Multilingual Translation and Interpreting Corpus: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

Alice Fedotova; Adriano Ferraresi; Maja Miličević Petrović; Alberto Barrón-Cedeño

Constructing a Multimodal, Multilingual Translation and Interpreting Corpus: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

Alice Fedotova, Adriano Ferraresi, Maja Miličević Petrović, Alberto Barrón-Cedeño

Abstract

This paper presents a novel pipeline for constructing multimodal and multilingual parallel corpora, with a focus on evaluating state-of-the-art ASR tools for verbatim transcription. Our findings indicate that current technologies can streamline corpus construction, with fine-tuning showing promising results in terms of transcription quality compared to out-of-the-box Whisper models. The lowest overall WER achieved for English was 0.180, using a fine-tuned Whisper-small model. As for Italian, the fine-tuned Whisper-small model obtained a lower WER of 0.201 compared to the baseline Whisper-small’s WER of 0.219. While limitations remain, the updated pipeline is expected to drastically reduce the human efforts involved.

Anthology ID:: 2024.clicit-1.42
Volume:: Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:: December
Year:: 2024
Address:: Pisa, Italy
Editors:: Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:: CLiC-it
SIG:
Publisher:: CEUR Workshop Proceedings
Note:
Pages:: 349–355
Language:
URL:: https://aclanthology.org/2024.clicit-1.42/
DOI:
Bibkey:
Cite (ACL):: Alice Fedotova, Adriano Ferraresi, Maja Miličević Petrović, and Alberto Barrón-Cedeño. 2024. Constructing a Multimodal, Multilingual Translation and Interpreting Corpus: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 349–355, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):: Constructing a Multimodal, Multilingual Translation and Interpreting Corpus: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription (Fedotova et al., CLiC-it 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.clicit-1.42.pdf

PDF Cite Search Fix data