Choosing What to Mask: More Informed Masking for Multimodal Machine Translation

Julia Sato, Helena Caseli, Lucia Specia


Abstract
Pre-trained language models have achieved remarkable results on several NLP tasks. Most of them adopt masked language modeling to learn representations by randomly masking tokens and predicting them based on their context. However, this random selection of tokens to be masked is inefficient to learn some language patterns as it may not consider linguistic information that can be helpful for many NLP tasks, such as multimodal machine translation (MMT). Hence, we propose three novel masking strategies for cross-lingual visual pre-training - more informed visual masking, more informed textual masking, and more informed visual and textual masking - each one focusing on learning different linguistic patterns. We apply them to Vision Translation Language Modelling for video subtitles (Sato et al., 2022) and conduct extensive experiments on the Portuguese-English MMT task. The results show that our masking approaches yield significant improvements over the original random masking strategy for downstream MMT performance. Our models outperform the MMT baseline and we achieve state-of-the-art accuracy (52.70 in terms of BLEU score) on the How2 dataset, indicating that more informed masking helps in acquiring an understanding of specific language structures and has great potential for language understanding.
Anthology ID:
2023.acl-srw.35
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Vishakh Padmakumar, Gisela Vallejo, Yao Fu
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
244–253
Language:
URL:
https://aclanthology.org/2023.acl-srw.35
DOI:
10.18653/v1/2023.acl-srw.35
Bibkey:
Cite (ACL):
Julia Sato, Helena Caseli, and Lucia Specia. 2023. Choosing What to Mask: More Informed Masking for Multimodal Machine Translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 244–253, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Choosing What to Mask: More Informed Masking for Multimodal Machine Translation (Sato et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-srw.35.pdf