The Effects of Pretraining in Video-Guided Machine Translation

Ammon Shurtz; Lawry Sorenson; Stephen D. Richardson

The Effects of Pretraining in Video-Guided Machine Translation

Ammon Shurtz, Lawry Sorenson, Stephen D. Richardson

Abstract

We propose an approach that improves the performance of VMT (Video-guided Machine Translation) models, which integrate text and video modalities. We experiment with the MAD (Movie Audio Descriptions) dataset, a new dataset which contains transcribed audio descriptions of movies. We find that the MAD dataset is more lexically rich than the VATEX dataset (the current VMT baseline), and we experiment with MAD pretraining to improve performance on the VATEX dataset. We experiment with two different video encoder architectures: a Conformer (Convolution-augmented Transformer) and a Transformer. Additionally, we conduct experiments by masking the source sentences to assess the degree to which the performance of both architectures improves due to pretraining on additional video data. Finally, we conduct an analysis of the transfer learning potential of a video dataset and compare it to pretraining on a text-only dataset. Our findings demonstrate that pretraining with a lexically rich dataset leads to significant improvements in model performance when models use both text and video modalities.

Anthology ID:: 2024.lrec-main.1380
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 15888–15898
Language:
URL:: https://aclanthology.org/2024.lrec-main.1380/
DOI:
Bibkey:
Cite (ACL):: Ammon Shurtz, Lawry Sorenson, and Stephen D. Richardson. 2024. The Effects of Pretraining in Video-Guided Machine Translation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15888–15898, Torino, Italia. ELRA and ICCL.
Cite (Informal):: The Effects of Pretraining in Video-Guided Machine Translation (Shurtz et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.1380.pdf

PDF Cite Search Fix data