Jiayang Chen


2024

pdf bib
Document Hashing with Multi-Grained Prototype-Induced Hierarchical Generative Model
Qian Zhang | Qinliang Su | Jiayang Chen | Zhenpeng Song
Findings of the Association for Computational Linguistics: EMNLP 2024

Document hashing plays a crucial role in large-scale information retrieval. However, existing unsupervised document hashing methods merely consider flat semantics of documents, resulting in the inability of preserving hierarchical semantics in hash codes. In this paper, we propose a hierarchical generative model that can model and leverage the hierarchical structure of semantics. Specifically, we introduce hierarchical prototypes into the model to construct a hierarchical prior distribution, which is integrated into the variational auto-encoder (VAE) framework, enabling the model to produce hash codes preserving rough hierarchical semantics. To further promote the preservation of hierarchical structure, we force the hash code to preserve as much semantic information as possible via contrastive learning, which exploits the hierarchical pseudo labels produced during VAE training. Extensive experiments on three benchmarks outperform all baseline methods, demonstrating the superiority of our proposed model on both hierarchical datasets and flat datasets.