In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models’ (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we first study two common approaches for personalizing LLMs: textual prompting and direct fine-tuning. We observed that textual prompting often struggles to yield responses that are similar to the ground truths in datasets, while direct fine-tuning tends to produce repetitive or overly generic replies. To alleviate those issues, we propose **S**elective **P**rompt **T**uning (SPT), which softly prompts LLMs for personalized conversations in a selective way. Concretely, SPT initializes a set of soft prompts and uses a trainable dense retriever to adaptively select suitable soft prompts for LLMs according to different input contexts, where the prompt retriever is dynamically updated through feedback from the LLMs. Additionally, we propose context-prompt contrastive learning and prompt fusion learning to encourage the SPT to enhance the diversity of personalized conversations. Experiments on the CONVAI2 dataset demonstrate that SPT significantly enhances response diversity by up to 90%, along with improvements in other critical performance indicators. Those results highlight the efficacy of SPT in fostering engaging and personalized dialogue generation. The SPT model code is [publicly available](https://github.com/hqsiswiliam/SPT) for further exploration.
Session-based recommendation (SBR) is a challenging task that involves predicting a user’s next item click based on their recent session history. Presently, many state-of-the-art methodologies employ graph neural networks to model item transitions. Notwithstanding their impressive performance, graph-based models encounter significant challenges when confronted with intricate session dependencies and data sparsity in real-world scenarios, ultimately constraining their capacity to enhance recommendation accuracy. In recognition of these challenges, we introduce an innovative methodology known as ‘Mssen,’ which stands for Multi-collaborative self-supervised learning in hypergraph neural networks. Mssen is meticulously crafted to adeptly discern user intent. Our approach initiates by representing session-based data as a hypergraph, adeptly capturing intricate, high-order relationships. Subsequently, we employ self-supervised learning on item-session hypergraphs to mitigate the challenges of data sparsity, all without necessitating manual fine-tuning, extensive search, or domain-specific expertise in augmentation selection. Comprehensive experimental analyses conducted across multiple datasets consistently underscore the superior performance of our approach when compared to existing methodologies.
Graph neural networks (GNNs) play a fundamental role in anomaly detection, excelling at the identification of node anomalies by aggregating information from neighboring nodes. Nonetheless, they exhibit vulnerability to attacks, with even minor alterations in the graph structure or node attributes resulting in substantial performance degradation. To address this critical challenge, we introduce an innovative mechanism for graph adversarial training, meticulously designed to bolster GNN-based anomaly detection systems against potential poisoning attacks. This novel approach follows a two-step framework. (1) In the initial phase, we employ a Multiple-Objective Generative Adversarial Attack (MO-GAA), which focuses on generating feature modifications and inducing structural disruptions within the graph. Its primary objective is to mimic the adversarial behavior of potential attackers on the anomaly detection graph, with the explicit intention of confounding the anomaly detector. (2) In the subsequent stage, we introduce Purification-Based Adversarial Attack Defense (PB-AAD), a method specifically designed to rectify any contamination and restore the integrity of the graph. The central aim of PB-AAD is to counteract the destructive actions carried out by potential attackers. Our empirical findings, derived from extensive experiments conducted on four real-world anomaly detection datasets, serve to demonstrate how MO-GAA systematically disrupts the graph, compromising the effectiveness of GNN-based detectors, while PB-AAD effectively mitigates these adversarial actions, thereby enhancing the overall robustness of GNN-based anomaly detectors.
Image-caption pretraining has been quite successfully used for downstream vision tasks like zero-shot image classification and object detection. However, image-caption pretraining is still a hard problem – it requires multiple concepts (nouns) from captions to be aligned to several objects in images. To tackle this problem, we go to the roots – the best learner, children. We take inspiration from cognitive science studies dealing with children’s language learning to propose a curriculum learning framework. The learning begins with easy-to-align image caption pairs containing one concept per caption. The difficulty is progressively increased with each new phase by adding one more concept per caption. Correspondingly, the knowledge acquired in each learning phase is utilized in subsequent phases to effectively constrain the learning problem to aligning one new concept-object pair in each phase. We show that this learning strategy improves over vanilla image-caption training in various settings – pretraining from scratch, using a pretrained image or/and pretrained text encoder, low data regime etc.
Knowledge graph completion (KGC) aims to predict unseen edges in knowledge graphs (KGs), resulting in the discovery of new facts. A new class of methods have been proposed to tackle this problem by aggregating path information. These methods have shown tremendous ability in the task of KGC. However they are plagued by efficiency issues. Though there are a few recent attempts to address this through learnable path pruning, they often sacrifice the performance to gain efficiency. In this work, we identify two intrinsic limitations of these methods that affect the efficiency and representation quality. To address the limitations, we introduce a new method, TAGNet, which is able to efficiently propagate information. This is achieved by only aggregating paths in a fixed window for each source-target pair. We demonstrate that the complexity of TAGNet is independent of the number of layers. Extensive experiments demonstrate that TAGNet can cut down on the number of propagated messages by as much as 90% while achieving competitive performance on multiple KG datasets.
The human recognition system has presented the remarkable ability to effortlessly learn novel knowledge from only a few trigger events based on prior knowledge, which is called insight learning. Mimicking such behavior on Knowledge Graph Reasoning (KGR) is an interesting and challenging research problem with many practical applications. Simultaneously, existing works, such as knowledge embedding and few-shot learning models, have been limited to conducting KGR in either “seen-to-seen” or “unseen-to-unseen” scenarios. To this end, we propose a neural insight learning framework named Eureka to bridge the “seen” to “unseen” gap. Eureka is empowered to learn the seen relations with sufficient training triples while providing the flexibility of learning unseen relations given only one trigger without sacrificing its performance on seen relations. Eureka meets our expectation of the model to acquire seen and unseen relations at no extra cost, and eliminate the need to retrain when encountering emerging unseen relations. Experimental results on two real-world datasets demonstrate that the proposed framework also outperforms various state-of-the-art baselines on datasets of both seen and unseen relations.
Natural Language Sentence Matching (NLSM) serves as the core of many natural language processing tasks. 1) Most previous work develops a single specific neural model for NLSM tasks. 2) There is no previous work considering adversarial attack to improve the performance of NLSM tasks. 3) Adversarial attack is usually used to generate adversarial samples that can fool neural models. In this paper, we first find a phenomenon that different categories of samples have different vulnerabilities. Vulnerability is the difficulty degree in changing the label of a sample. Considering the phenomenon, we propose a general two-stage training framework to enhance neural models with Vulnerability via Adversarial Attack (VAA). We design criteria to measure the vulnerability which is obtained by adversarial attack. VAA framework can be adapted to various neural models by incorporating the vulnerability. In addition, we prove a theorem and four corollaries to explain the factors influencing vulnerability effectiveness. Experimental results show that VAA significantly improves the performance of neural models on NLSM datasets. The results are also consistent with the theorem and corollaries. The code is released on https://github.com/rzhangpku/VAA.
We present the first comprehensive, open source multimedia knowledge extraction system that takes a massive stream of unstructured, heterogeneous multimedia data from various sources and languages as input, and creates a coherent, structured knowledge base, indexing entities, relations, and events, following a rich, fine-grained ontology. Our system, GAIA, enables seamless search of complex graph queries, and retrieves multimedia evidence including text, images and videos. GAIA achieves top performance at the recent NIST TAC SM-KBP2019 evaluation. The system is publicly available at GitHub and DockerHub, with a narrated video that documents the system.