RelCLIP: Adapting Language-Image Pretraining for Visual Relationship Detection via Relational Contrastive Learning
Yi Zhu | Zhaoqing Zhu | Bingqian Lin | Xiaodan Liang | Feng Zhao | Jianzhuang Liu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Conventional visual relationship detection models only use the numeric ids of relation labels for training, but ignore the semantic correlation between the labels, which leads to severe training biases and harms the generalization ability of representations. In this paper, we introduce compact language information of relation labels for regularizing the representation learning of visual relations. Specifically, we propose a simple yet effective visual Relationship prediction framework that transfers natural language knowledge learned from Contrastive Language-Image Pre-training (CLIP) models to enhance the relationship prediction, termed RelCLIP. Benefiting from the powerful visual-semantic alignment ability of CLIP at image level, we introduce a novel Relational Contrastive Learning (RCL) approach which explores relation-level visual-semantic alignment via learning to match cross-modal relational embeddings. By collaboratively learning the semantic coherence and discrepancy from relation triplets, the model can generate more discriminative and robust representations. Experimental results on the Visual Genome dataset show that RelCLIP achieves significant improvements over strong baselines under full (provide accurate labels) and distant supervision (provide noise labels), demonstrating its powerful generalization ability in learning relationship representations. Code will be available at https://gitee.com/mindspore/models/tree/master/research/cv/RelCLIP.
Can Language Models Serve as Temporal Knowledge Bases?
Ruilin Zhao | Feng Zhao | Guandong Xu | Sixiao Zhang | Hai Jin
Findings of the Association for Computational Linguistics: EMNLP 2022
Recent progress regarding the use of language models (LMs) as knowledge bases (KBs) has shown that language models can act as structured knowledge bases for storing relational facts. However, most existing works only considered the LM-as-KB paradigm in a static setting, which ignores the analysis of temporal dynamics of world knowledge. Furthermore, a basic function of KBs, i.e., the ability to store conflicting information (i.e., 1-N, N-1, and N-M relations), is underexplored. In this paper, we formulate two practical requirements for treating LMs as temporal KBs: (i) The capacity to store temporally-scoped knowledge that contains conflicting information and (ii) the ability to use stored knowledge for temporally-scoped knowledge queries. We introduce a new dataset called LAMA-TK which is aimed at probing temporally-scoped knowledge, and investigate the two above requirements to explore the LM-as-KB paradigm in the temporal domain. On the one hand, experiments show that LMs can memorize millions of temporally-scoped facts with relatively high accuracy and transfer stored knowledge to temporal knowledge queries, thereby expanding the LM-as-KB paradigm to the temporal domain. On the other hand, we show that memorizing conflicting information, which has been neglected by previous works, is still challenging for LMs and hinders the memorization of other unrelated one-to-one relationships.
OpticE: A Coherence Theory-Based Model for Link Prediction
Xiangyu Gui | Feng Zhao | Langjunqing Jin | Hai Jin
Proceedings of the 29th International Conference on Computational Linguistics
Knowledge representation learning is a key step required for link prediction tasks with knowledge graphs (KGs). During the learning process, the semantics of each entity are embedded by a vector or a point in a feature space. The distance between these points is a measure of semantic similarity. However, in a KG, while two entities may have similar semantics in some relations, they have different semantics in others. It is ambiguous to assign a fixed distance to depict the variant semantic similarity of entities. To alleviate the semantic ambiguity in KGs, we design a new embedding approach named OpticE, which is derived from the well-known physical phenomenon of optical interference. It is a lightweight and relation-adaptive model based on coherence theory, in which each entity’s semantics vary automatically regarding different relations. In addition, a unique negative sampling method is proposed to combine the multimapping properties and self-adversarial learning during the training process. The experimental results obtained on practical KG benchmarks show that the OpticE model, with elegant structures, can compete with existing link prediction methods.
- Hai Jin 2
- Yi Zhu 1
- Zhaoqing Zhu 1
- Bingqian Lin 1
- Xiaodan Liang 1
- show all...