DMSD: Dual-Modal Semantic Disentanglement for Compositional Zero-Shot Learning

Pan Yang; Jing Yang; Ruan Xiao li; Yuling Chen; Yuankai Wu; Quan Zhou; Xu Wang

DMSD: Dual-Modal Semantic Disentanglement for Compositional Zero-Shot Learning

Pan Yang, Jing Yang, Ruan Xiao li, Yuling Chen, Yuankai Wu, Quan Zhou, Xu Wang

Abstract

The core challenge of Compositional Zero-Shot Learning (CZSL) lies in learning representations of sub-concepts (attributes and objects) from seen compositions and recognizing unseen novel compositions. Most existing CZSL methods primarily focus on prompt optimization on the textual side, while overlooking insufficient visual attribute–object sub-concepts disentanglement under a text-centric paradigm. To this end, we propose DMSD, a Dual-Modal Semantic Disentanglement framework that jointly models visual and textual information to achieve effective sub-concept disentanglement. Specifically, DMSD introduces a Contextual Prompt Space, enabling both visual and textual modalities to be modeled under unified contextual semantic representations, thereby enhancing their alignment at the latent semantic level. Moreover, we design Visual Sub-concept Prototypes that explicitly extract and model visual sub-concept features, improving the independence and discriminability of visual sub-concept representations. Furthermore, to achieve fine-grained alignment between visual and textual sub-concepts, we propose a Class-Centroid Bridging Module that guides class centroids toward the textual semantic space, thereby ensuring cross-modal semantic consistency. Extensive experiments on three benchmark datasets (MIT-States, UT-Zappos, and C-GQA) demonstrate that DMSD achieves state-of-the-art performance in both closed-world and open-world settings. Our code is available at https://anonymous.4open.science/r/DMSD-9CC4.

Anthology ID:: 2026.findings-acl.1540
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 30815–30826
Language:
URL:: https://aclanthology.org/2026.findings-acl.1540/
DOI:
Bibkey:
Cite (ACL):: Pan Yang, Jing Yang, Ruan Xiao li, Yuling Chen, Yuankai Wu, Quan Zhou, and Xu Wang. 2026. DMSD: Dual-Modal Semantic Disentanglement for Compositional Zero-Shot Learning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 30815–30826, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: DMSD: Dual-Modal Semantic Disentanglement for Compositional Zero-Shot Learning (Yang et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1540.pdf
Checklist:: 2026.findings-acl.1540.checklist.pdf

PDF Cite Search Checklist Fix data