MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

Woohyun Cho; Youngmin Kim; Sunghyun Lee; Youngjae Yu

doi:10.18653/v1/2025.emnlp-main.689

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

Woohyun Cho, Youngmin Kim, Sunghyun Lee, Youngjae Yu

Abstract

Lyrics translation requires both accurate semantic transfer and preservation of musical rhythm, syllabic structure, and poetic style. In animated musicals, the challenge intensifies due to alignment with visual and auditory cues. We introduce Multilingual Audio-Video Lyrics Benchmark for Animated Song Translation (MAVL), the first multilingual, multimodal benchmark for singable lyrics translation. By integrating text, audio, and video, MAVL enables richer and more expressive translations than text-only approaches. Building on this, we propose Syllable-Constrained Audio-Video LLM with Chain-of-Thought (SylAVL-CoT), which leverages audio-video cues and enforces syllabic constraints to produce natural-sounding lyrics. Experimental results demonstrate that SylAVL-CoT significantly outperforms text-based models in singability and contextual accuracy, emphasizing the value of multimodal, multilingual approaches for lyrics translation.

Anthology ID:: 2025.emnlp-main.689
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13640–13668
Language:
URL:: https://aclanthology.org/2025.emnlp-main.689/
DOI:: 10.18653/v1/2025.emnlp-main.689
Bibkey:
Cite (ACL):: Woohyun Cho, Youngmin Kim, Sunghyun Lee, and Youngjae Yu. 2025. MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 13640–13668, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation (Cho et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.689.pdf
Checklist:: 2025.emnlp-main.689.checklist.pdf

PDF Cite Search Checklist Fix data