Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning

Yuxia Geng; Runkai Zhu; Jiaoyan Chen; Jintai Chen; Xiang Chen; Zhuo Chen; Shuofei Qiao; Yuxiang Wang; Xiaoliang Xu; Sheng-Jun Huang

doi:10.18653/v1/2025.findings-acl.137

Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning

Yuxia Geng, Runkai Zhu, Jiaoyan Chen, Jintai Chen, Xiang Chen, Zhuo Chen, Shuofei Qiao, Yuxiang Wang, Xiaoliang Xu, Sheng-Jun Huang

Abstract

Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL). However, due to the feature divergence of an attribute (resp. object) when combined with different objects (resp. attributes), it is challenging to learn disentangled primitive features that are general across different compositions. To this end, we propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs and constrains the disentangled primitive features to be general across these compositions. More specifically, we leverage a compositional graph to define the overall primitive-sharing relationships between compositions, and build a task-specific architecture upon the recently successful large pre-trained vision-language model (VLM) CLIP, with dual cross-composition disentangling adapters (called L-Adapter and V-Adapter) inserted into CLIP’s frozen text and image encoders, respectively. Evaluation on three popular CZSL benchmarks shows that our proposed solution significantly improves the performance of CZSL, and its components have been verified by solid ablation studies. Our code and data are available at: https://github.com/zhurunkai/DCDA.

Anthology ID:: 2025.findings-acl.137
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2678–2690
Language:
URL:: https://aclanthology.org/2025.findings-acl.137/
DOI:: 10.18653/v1/2025.findings-acl.137
Bibkey:
Cite (ACL):: Yuxia Geng, Runkai Zhu, Jiaoyan Chen, Jintai Chen, Xiang Chen, Zhuo Chen, Shuofei Qiao, Yuxiang Wang, Xiaoliang Xu, and Sheng-Jun Huang. 2025. Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning. In Findings of the Association for Computational Linguistics: ACL 2025, pages 2678–2690, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning (Geng et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.137.pdf

PDF Cite Search Fix data