Jian Li


2024

pdf bib
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
Wenhao Yu | Hongming Zhang | Xiaoman Pan | Peixin Cao | Kaixin Ma | Jian Li | Hongwei Wang | Dong Yu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Retrieval-augmented language model (RALM) represents a significant advancement in mitigating factual hallucination by leveraging external knowledge sources. However, the reliability of the retrieved information is not always guaranteed, and the retrieval of irrelevant data can mislead the response generation. Moreover, standard RALMs frequently neglect their intrinsic knowledge due to the interference from retrieved information. In instances where the retrieved information is irrelevant, RALMs should ideally utilize their intrinsic knowledge or, in the absence of both intrinsic and retrieved knowledge, opt to respond with “unknown” to avoid hallucination. In this paper, we introduces Chain-of-Note (CoN), a novel approach to improve robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios. The core idea of CoN is to generate sequential reading notes for each retrieved document, enabling a thorough evaluation of their relevance to the given question and integrating this information to formulate the final answer. Our experimental results show that GPT-4, when equipped with CoN, outperforms the Chain-of-Thought approach. Besides, we utilized GPT-4 to create 10K CoN data, subsequently trained on smaller models like OPT and LLaMa-2. Our experiments across four open-domain QA benchmarks show that fine-tuned RALMs equipped with CoN significantly outperform standard fine-tuned RALMs.

pdf bib
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
Pengyu Cheng | Yifan Yang | Jian Li | Yong Dai | Tianhao Hu | Peixin Cao | Nan Du | Xiaolong Li
Findings of the Association for Computational Linguistics: ACL 2024

Human preference alignment is essential to improve the interaction quality of large language models (LLMs). Existing alignment methods depend on manually annotated preference data to guide the LLM optimization directions. However, continuously updating LLMs for alignment raises a distribution gap between model-generated samples and human-annotated responses, hindering training effectiveness. To mitigate this issue, previous methods require additional preference annotation on newly generated samples to adapt to the shifted distribution, which consumes a large amount of annotation resources. Targeting more efficient human preference optimization, we propose an Adversarial Preference Optimization (APO) framework, in which the LLM and the reward model update alternatively via a min-max game. Through adversarial training, the reward model can adapt to the shifted generation distribution of the LLM without any additional annotation. With comprehensive experiments, we find the proposed adversarial training framework further enhances existing alignment baselines in terms of LLM helpfulness and harmlessness. The code is at https://github.com/Linear95/APO.

pdf bib
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Jian Li | Haojing Huang | Yujia Zhang | Pengfei Xu | Xi Chen | Rui Song | Lida Shi | Jingwen Wang | Hao Xu
Findings of the Association for Computational Linguistics: EMNLP 2024

Recently, there has been significant interest in replacing the reward model in Reinforcement Learning with Human Feedback (RLHF) methods for Large Language Models (LLMs), such as Direct Preference Optimization (DPO) and its variants. These approaches commonly use a binary cross-entropy mechanism on pairwise samples, i.e., minimizing and maximizing the loss based on preferred or dis-preferred responses, respectively. However, while this training strategy omits the reward model, it also overlooks the varying preference degrees within different responses. We hypothesize that this is a key factor hindering LLMs from sufficiently understanding human preferences. To address this problem, we propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss, thereby helping LLMs improve their ability to understand the degree of preference. Extensive experiments are conducted on two widely used datasets of different tasks. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods and significantly boost their performance to achieve state-of-the-art performance. We also conduct detailed analyses to offer comprehensive insights into SPO, which verifies its effectiveness. The code is available at https://github.com/lijian16/SPO.

pdf bib
EmoPrompt-ECPE: Emotion Knowledge-aware Prompt-tuning for Emotion-Cause Pair Extraction
Xue Gu | Zhihan Zhou | Ziyao Meng | Jian Li | Tiago Gomes | Adriano Tavares | Hao Xu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Emotion-cause pair extraction (ECPE) main focus is on extracting all potential emotion clauses and corresponding cause clauses from unannotated documents. Existing methods achieve promising results with the help of fine-tuning and prompt paradigms, but they present three downsides. First, most approaches cannot distinguish between the emotion-cause pairs that belong to different types of emotions, limiting the existing approaches’ applicability. Second, existing prompt methods utilize a one-to-one mapping relation to achieve label words to category mapping, which brings considerable bias to the results. Third, existing methods achieve the cause extraction task supported by explicit semantic understanding or basic prompt templates, ignoring the implicit information contained in the cause clauses themselves. To solve these issues, we propose an Emotion knowledge-aware Prompt-tuning for Emotion-Cause Pair Extraction (EmoPrompt-ECPE) method, which integrate the knowledge of emotion categories in the ECPE task and mine the implicit knowledge of cause clauses. Specifically, we inject the latent knowledge of the cause clauses and the emotion types into the prompt template. Besides, we extend the emotion labels for many-to-one mapping of label words to categories with an external emotion word base. Furthermore, we utilize the cosine similarity filtering of the label word base to reduce the noise caused by knowledge introduction. Experiments on both Chinese and English benchmark datasets show that our approach can achieve state-of-the-art results. Our code and data can be found at: https://github.com/xy-xiaotudou/EmoPrompt-ECPE.

2022

pdf bib
Controlled Text Generation Using Dictionary Prior in Variational Autoencoders
Xianghong Fang | Jian Li | Lifeng Shang | Xin Jiang | Qun Liu | Dit-Yan Yeung
Findings of the Association for Computational Linguistics: ACL 2022

While variational autoencoders (VAEs) have been widely applied in text generation tasks, they are troubled by two challenges: insufficient representation capacity and poor controllability. The former results from the posterior collapse and restrictive assumption, which impede better representation learning. The latter arises as continuous latent variables in traditional formulations hinder VAEs from interpretability and controllability. In this paper, we propose Dictionary Prior (DPrior), a new data-driven prior that enjoys the merits of expressivity and controllability. To facilitate controlled text generation with DPrior, we propose to employ contrastive learning to separate the latent space into several parts. Extensive experiments on both language modeling and controlled text generation demonstrate the effectiveness of the proposed approach.

pdf bib
MINER: Multi-Interest Matching Network for News Recommendation
Jian Li | Jieming Zhu | Qiwei Bi | Guohao Cai | Lifeng Shang | Zhenhua Dong | Xin Jiang | Qun Liu
Findings of the Association for Computational Linguistics: ACL 2022

Personalized news recommendation is an essential technique to help users find interested news. Accurately matching user’s interests and candidate news is the key to news recommendation. Most existing methods learn a single user embedding from user’s historical behaviors to represent the reading interest. However, user interest is usually diverse and may not be adequately modeled by a single user embedding. In this paper, we propose a poly attention scheme to learn multiple interest vectors for each user, which encodes the different aspects of user interest. We further propose a disagreement regularization to make the learned interests vectors more diverse. Moreover, we design a category-aware attention weighting strategy that incorporates the news category information as explicit interest signals into the attention mechanism. Extensive experiments on the MIND news recommendation benchmark demonstrate that our approach significantly outperforms existing state-of-the-art methods.

pdf bib
MTRec: Multi-Task Learning over BERT for News Recommendation
Qiwei Bi | Jian Li | Lifeng Shang | Xin Jiang | Qun Liu | Hanfang Yang
Findings of the Association for Computational Linguistics: ACL 2022

Existing news recommendation methods usually learn news representations solely based on news titles. To sufficiently utilize other fields of news information such as category and entities, some methods treat each field as an additional feature and combine different feature vectors with attentive pooling. With the adoption of large pre-trained models like BERT in news recommendation, the above way to incorporate multi-field information may encounter challenges: the shallow feature encoding to compress the category and entity information is not compatible with the deep BERT encoding. In this paper, we propose a multi-task method to incorporate the multi-field information into BERT, which improves its news encoding capability. Besides, we modify the gradients of auxiliary tasks based on their gradient conflicts with the main task, which further boosts the model performance. Extensive experiments on the MIND news recommendation benchmark show the effectiveness of our approach.

pdf bib
DIGAT: Modeling News Recommendation with Dual-Graph Interaction
Zhiming Mao | Jian Li | Hongru Wang | Xingshan Zeng | Kam-Fai Wong
Findings of the Association for Computational Linguistics: EMNLP 2022

News recommendation (NR) is essential for online news services. Existing NR methods typically adopt a news-user representation learning framework, facing two potential limitations. First, in news encoder, single candidate news encoding suffers from an insufficient semantic information problem. Second, existing graph-based NR methods are promising but lack effective news-user feature interaction, rendering the graph-based recommendation suboptimal. To overcome these limitations, we propose dual-interactive graph attention networks (DIGAT) consisting of news- and user-graph channels. In the news-graph channel, we enrich the semantics of single candidate news by incorporating the semantically relevant news information with a semantic-augmented graph (SAG). In the user-graph channel, multi-level user interests are represented with a news-topic graph. Most notably, we design a dual-graph interaction process to perform effective feature interaction between the news and user graphs, which facilitates accurate news-user representation matching. Experiment results on the benchmark dataset MIND show that DIGAT outperforms existing news recommendation methods. Further ablation studies and analyses validate the effectiveness of (1) semantic-augmented news graph modeling and (2) dual-graph interaction.

2021

pdf bib
Natural Language Processing Meets Quantum Physics: A Survey and Categorization
Sixuan Wu | Jian Li | Peng Zhang | Yue Zhang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent research has investigated quantum NLP, designing algorithms that process natural language in quantum computers, and also quantum-inspired algorithms that improve NLP performance on classical computers. In this survey, we review representative methods at the intersection of NLP and quantum physics in the past ten years, categorizing them according to the use of quantum theory, the linguistic targets that are modeled, and the downstream application. The literature review ends with a discussion on the key factors to the success that has been achieved by existing work, as well as challenges ahead, with the goal of better understanding the promises and further directions.

2019

pdf bib
Relation Extraction with Temporal Reasoning Based on Memory Augmented Distant Supervision
Jianhao Yan | Lin He | Ruqin Huang | Jian Li | Ying Liu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Distant supervision (DS) is an important paradigm for automatically extracting relations. It utilizes existing knowledge base to collect examples for the relation we intend to extract, and then uses these examples to automatically generate the training data. However, the examples collected can be very noisy, and pose significant challenge for obtaining high quality labels. Previous work has made remarkable progress in predicting the relation from distant supervision, but typically ignores the temporal relations among those supervising instances. This paper formulates the problem of relation extraction with temporal reasoning and proposes a solution to predict whether two given entities participate in a relation at a given time spot. For this purpose, we construct a dataset called WIKI-TIME which additionally includes the valid period of a certain relation of two entities in the knowledge base. We propose a novel neural model to incorporate both the temporal information encoding and sequential reasoning. The experimental results show that, compared with the best of existing models, our model achieves better performance in both WIKI-TIME dataset and the well-studied NYT-10 dataset.

pdf bib
Information Aggregation for Multi-Head Attention with Routing-by-Agreement
Jian Li | Baosong Yang | Zi-Yi Dou | Xing Wang | Michael R. Lyu | Zhaopeng Tu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Multi-head attention is appealing for its ability to jointly extract different types of information from multiple representation subspaces. Concerning the information aggregation, a common practice is to use a concatenation followed by a linear transformation, which may not fully exploit the expressiveness of multi-head attention. In this work, we propose to improve the information aggregation for multi-head attention with a more powerful routing-by-agreement algorithm. Specifically, the routing algorithm iteratively updates the proportion of how much a part (i.e. the distinct information learned from a specific subspace) should be assigned to a whole (i.e. the final output representation), based on the agreement between parts and wholes. Experimental results on linguistic probing tasks and machine translation tasks prove the superiority of the advanced information aggregation over the standard linear transformation.

2018

pdf bib
Multi-Head Attention with Disagreement Regularization
Jian Li | Zhaopeng Tu | Baosong Yang | Michael R. Lyu | Tong Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Multi-head attention is appealing for the ability to jointly attend to information from different representation subspaces at different positions. In this work, we introduce a disagreement regularization to explicitly encourage the diversity among multiple attention heads. Specifically, we propose three types of disagreement regularization, which respectively encourage the subspace, the attended positions, and the output representation associated with each attention head to be different from other heads. Experimental results on widely-used WMT14 English-German and WMT17 Chinese-English translation tasks demonstrate the effectiveness and universality of the proposed approach.