In recent years, large language models (LLMs) have attracted significant interest from the research community due to their broad applicability in many language-oriented tasks, and are now widely used in numerous areas of production and daily life. One source of the powerful capabilities of LLMs is the massive scale of their pre-training dataset. However, these pre-training datasets contain many outdated, harmful, and personally sensitive information, which inevitably becomes memorized by LLM during the pre-training process. Eliminating this undesirable data is crucial for ensuring the model’s safety and enhancing the user experience. However, the cost of extensively cleaning the pre-training dataset and retraining the model from scratch is very high. In this work, we propose ULMR , a unlearning framework for LLMs , which first uses carefully designed prompts to rewrite the instructions in the specified dataset, and generate corresponding negative responses. Subsequently, to ensure that the model does not excessively deviate post-training, we perform model parameter averaging to preserve the performance of the original LLM. We conducted experiments on two public datasets, TOFU and RWKU, demonstrating that our method can effectively forget specified information while retaining the capabilities of the original LLM.
Contrastive learning has achieved impressive success in generation tasks to militate the “exposure bias” problem and discriminatively exploit the different quality of references. Existing works mostly focus on contrastive learning on the instance-level without discriminating the contribution of each word, while keywords are the gist of the text and dominant the constrained mapping relationships. Hence, in this work, we propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text. Concretely, we first propose a keyword graph via contrastive correlations of positive-negative pairs to iteratively polish the keyword representations. Then, we construct intra-contrasts within instance-level and keyword-level, where we assume words are sampled nodes from a sentence distribution. Finally, to bridge the gap between independent contrast levels and tackle the common contrast vanishing problem, we propose an inter-contrast mechanism that measures the discrepancy between contrastive keyword nodes respectively to the instance distribution. Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
A growing interest emerges in event extraction which aims to extract multiple events with triggers and arguments. Previous methods mitigate the problem of multiple events extraction by predicting the arguments conditioned on the event trigger and event type, assuming that these arguments belong to a single event. However, the assumption is invalid in general as there may be multiple events. Therefore, we present a unified framework called MatEE for trigger-sharing events extraction. It resolves the kernel bottleneck by effectively modeling the relations between arguments by an event matrix, where trigger-sharing events are represented by multiple cliques. We verify the proposed method on 3 widely-used benchmark datasets of event extraction. The experimental results show that it beats all the advanced competitors, significantly improving the state-of-the-art performances in event extraction.
Distance based knowledge graph embedding methods show promising results on link prediction task, on which two topics have been widely studied: one is the ability to handle complex relations, such as N-to-1, 1-to-N and N-to-N, the other is to encode various relation patterns, such as symmetry/antisymmetry. However, the existing methods fail to solve these two problems at the same time, which leads to unsatisfactory results. To mitigate this problem, we propose PairRE, a model with paired vectors for each relation representation. The paired vectors enable an adaptive adjustment of the margin in loss function to fit for different complex relations. Besides, PairRE is capable of encoding three important relation patterns, symmetry/antisymmetry, inverse and composition. Given simple constraints on relation representations, PairRE can encode subrelation further. Experiments on link prediction benchmarks demonstrate the proposed key capabilities of PairRE. Moreover, We set a new state-of-the-art on two knowledge graph datasets of the challenging Open Graph Benchmark.
The ambiguous annotation criteria lead to divergence of Chinese Word Segmentation (CWS) datasets in various granularities. Multi-criteria Chinese word segmentation aims to capture various annotation criteria among datasets and leverage their common underlying knowledge. In this paper, we propose a domain adaptive segmenter to exploit diverse criteria of various datasets. Our model is based on Bidirectional Encoder Representations from Transformers (BERT), which is responsible for introducing open-domain knowledge. Private and shared projection layers are proposed to capture domain-specific knowledge and common knowledge, respectively. We also optimize computational efficiency via distillation, quantization, and compiler optimization. Experiments show that our segmenter outperforms the previous state of the art (SOTA) models on 10 CWS datasets with superior efficiency.
Knowledge-driven conversation approaches have achieved remarkable research attention recently. However, generating an informative response with multiple relevant knowledge without losing fluency and coherence is still one of the main challenges. To address this issue, this paper proposes a method that uses recurrent knowledge interaction among response decoding steps to incorporate appropriate knowledge. Furthermore, we introduce a knowledge copy mechanism using a knowledge-aware pointer network to copy words from external knowledge according to knowledge attention distribution. Our joint neural conversation model which integrates recurrent Knowledge-Interaction and knowledge Copy (KIC) performs well on generating informative responses. Experiments demonstrate that our model with fewer parameters yields significant improvements over competitive baselines on two datasets Wizard-of-Wikipedia(average Bleu +87%; abs.: 0.034) and DuConv(average Bleu +20%; abs.: 0.047)) with different knowledge formats (textual & structured) and different languages (English & Chinese).
Chinese Spelling Check (CSC) is a task to detect and correct spelling errors in Chinese natural language. Existing methods have made attempts to incorporate the similarity knowledge between Chinese characters. However, they take the similarity knowledge as either an external input resource or just heuristic rules. This paper proposes to incorporate phonological and visual similarity knowledge into language models for CSC via a specialized graph convolutional network (SpellGCN). The model builds a graph over the characters, and SpellGCN is learned to map this graph into a set of inter-dependent character classifiers. These classifiers are applied to the representations extracted by another network, such as BERT, enabling the whole network to be end-to-end trainable. Experiments are conducted on three human-annotated datasets. Our method achieves superior performance against previous models by a large margin.
Numerical reasoning over texts, such as addition, subtraction, sorting and counting, is a challenging machine reading comprehension task, since it requires both natural language understanding and arithmetic computation. To address this challenge, we propose a heterogeneous graph representation for the context of the passage and question needed for such reasoning, and design a question directed graph attention network to drive multi-step numerical reasoning over this context graph. Our model, which combines deep learning and graph reasoning, achieves remarkable results in benchmark datasets such as DROP.
Aspect-term sentiment analysis (ATSA) is a long-standing challenge in natural language process. It requires fine-grained semantical reasoning about a target entity appeared in the text. As manual annotation over the aspects is laborious and time-consuming, the amount of labeled data is limited for supervised learning. This paper proposes a semi-supervised method for the ATSA problem by using the Variational Autoencoder based on Transformer. The model learns the latent distribution via variational inference. By disentangling the latent representation into the aspect-specific sentiment and the lexical context, our method induces the underlying sentiment prediction for the unlabeled data, which then benefits the ATSA classifier. Our method is classifier-agnostic, i.e., the classifier is an independent module and various supervised models can be integrated. Experimental results are obtained on the SemEval 2014 task 4 and show that our method is effective with different the five specific classifiers and outperforms these models by a significant margin.
We propose AliMe Chat, an open-domain chatbot engine that integrates the joint results of Information Retrieval (IR) and Sequence to Sequence (Seq2Seq) based generation models. AliMe Chat uses an attentive Seq2Seq based rerank model to optimize the joint results. Extensive experiments show our engine outperforms both IR and generation based models. We launch AliMe Chat for a real-world industrial application and observe better results than another public chatbot.