Language Constrained Multimodal Hyper Adapter For Many-to-Many Multimodal Summarization

Nayu Liu; Fanglong Yao; Haoran Luo; Yong Yang; Chen Tang; Bo Lv

doi:10.18653/v1/2025.acl-long.1229

Language Constrained Multimodal Hyper Adapter For Many-to-Many Multimodal Summarization

Nayu Liu, Fanglong Yao, Haoran Luo, Yong Yang, Chen Tang, Bo Lv

Abstract

Multimodal summarization (MS) combines text and visuals to generate summaries. Recently, many-to-many multimodal summarization (M3S) garnered interest as it enables a unified model for multilingual and cross-lingual MS. Existing methods have made progress by facilitating the transfer of common multimodal summarization knowledge. While, prior M3S models that fully share parameters neglect the language-specific knowledge learning, where potential interference between languages may limit the flexible adaptation of MS modes across different language combinations and hinder further collaborative improvements in joint M3S training. Based on this observation, we propose Language Constrained Multimodal Hyper Adapter (LCMHA) for M3S. LCMHA integrates language-specific multimodal adapters into multilingual pre-trained backbones via a language constrained hypernetwork, enabling relaxed parameter sharing that enhances language-specific learning while preserving shared MS knowledge learning. In addition, a language-regularized hypernetwork is designed to balance intra- and inter-language learning, generating language-specific adaptation weights and enhancing the retention of distinct language features through the regularization of generated parameters. Experimental results on the M3Sum benchmark show LCMHA’s effectiveness and scalability across multiple multilingual pre-trained backbones.

Anthology ID:: 2025.acl-long.1229
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25285–25298
Language:
URL:: https://aclanthology.org/2025.acl-long.1229/
DOI:: 10.18653/v1/2025.acl-long.1229
Bibkey:
Cite (ACL):: Nayu Liu, Fanglong Yao, Haoran Luo, Yong Yang, Chen Tang, and Bo Lv. 2025. Language Constrained Multimodal Hyper Adapter For Many-to-Many Multimodal Summarization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25285–25298, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Language Constrained Multimodal Hyper Adapter For Many-to-Many Multimodal Summarization (Liu et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1229.pdf

PDF Cite Search Fix data