Qing Yang


2023

pdf bib
Adaptive Attention for Sparse-based Long-sequence Transformer
Xuanyu Zhang | Zhepeng Lv | Qing Yang
Findings of the Association for Computational Linguistics: ACL 2023

Recently, Transformers have been widely used in various fields and have achieved remarkable results. But it is still difficult for Transformer-based models to process longer sequences because self-attention in them scales quadratically with the sequence length. Although some models attempt to use sparse attention to reduce computational complexity, hand-crafted attention patterns are unable to select useful tokens adaptively according to the context. Thus, in this paper, we propose a novel efficient Transformer model with adaptive attention, A2-Former, for long sequence modeling. It can select useful tokens automatically in sparse attention by learnable position vectors, which consist of meta position and offset position vectors. Because the learnable offset position is not an integer vector, we utilize the interpolation technique to gather corresponding vectors from the input embedding matrix by discrete indexes. Experiments on Long Range Arena (LRA), a systematic and unified benchmark with different tasks, show that our model has achieved further improvement in performance compared with other sparse-based Transformers.

pdf bib
Pre-trained Personalized Review Summarization with Effective Salience Estimation
Hongyan Xu | Hongtao Liu | Zhepeng Lv | Qing Yang | Wenjun Wang
Findings of the Association for Computational Linguistics: ACL 2023

Personalized review summarization in recommender systems is a challenging task of generating condensed summaries for product reviews while preserving the salient content of reviews. Recently, Pretrained Language Models (PLMs) have become a new paradigm in text generation for the strong ability of natural language comprehension. However, it is nontrivial to apply PLMs in personalized review summarization directly since there are rich personalized information (e.g., user preferences and product characteristics) to be considered, which is crucial to the salience estimation of input review. In this paper, we propose a pre-trained personalized review summarization method, which aims to effectively incorporate the personalized information of users and products into the salience estimation of the input reviews. We design a personalized encoder that could identify the salient contents of the input sequence by jointly considering the semantic and personalized information respectively (i.e., ratings, user and product IDs, and linguistic features), yielding personalized representations for the input reviews and history summaries separately. Moreover, we design an interactive information selection mechanism that further identifies the salient contents of the input reviews and selects relative information from the history summaries. The results on real-world datasets show that our method performs better than the state-of-the-art baselines and could generate more readable summaries.

pdf bib
Generating Extractive Answers: Gated Recurrent Memory Reader for Conversational Question Answering
Xuanyu Zhang | Qing Yang
Findings of the Association for Computational Linguistics: EMNLP 2023

Conversational question answering (CQA) is a more complicated task than traditional single-turn machine reading comprehension (MRC). Different from large language models (LLMs) like ChatGPT, the models of CQA need to extract answers from given contents to answer follow-up questions according to conversation history. In this paper, we propose a novel architecture, i.e., Gated Recurrent Memory Reader (GRMR), which integrates traditional extractive MRC models into a generalized sequence-to-sequence framework. After the passage is encoded, the decoder will generate the extractive answers turn by turn. Different from previous models that concatenate the previous questions and answers as context superficially and redundantly, our model can use less storage space and consider historical memory deeply and selectively. Experiments on the Conversational Question Answering (CoQA) dataset show that our model achieves comparable results to most models with the least space occupancy.

pdf bib
PUNR: Pre-training with User Behavior Modeling for News Recommendation
Guangyuan Ma | Hongtao Liu | Xing W | Wanhui Qian | Zhepeng Lv | Qing Yang | Songlin Hu
Findings of the Association for Computational Linguistics: EMNLP 2023

News recommendation aims to predict click behaviors based on user behaviors. How to effectively model the user representations is the key to recommending preferred news. Existing works are mostly focused on improvements in the supervised fine-tuning stage. However, there is still a lack of PLM-based unsupervised pre-training methods optimized for user representations. In this work, we propose an unsupervised pre-training paradigm with two tasks, i.e. user behavior masking and user behavior generation, both towards effective user behavior modeling. Firstly, we introduce the user behavior masking pre-training task to recover the masked user behaviors based on their contextual behaviors. In this way, the model could capture a much stronger and more comprehensive user news reading pattern. Besides, we incorporate a novel auxiliary user behavior generation pre-training task to enhance the user representation vector derived from the user encoder. We use the above pre-trained user modeling encoder to obtain news and user representations in downstream fine-tuning. Evaluations on the real-world news benchmark show significant performance improvements over existing baselines.

pdf bib
MSCFFN: A New FFN with Multi-Space Cross to Accelerate Transformer
Tang Dongge | Qing Yang
Findings of the Association for Computational Linguistics: EMNLP 2023

Transformer models have achieved impressive success in various natural language processing tasks. But it is also limited used in some areas and the heavy computation complexity is one of the main limitations. Many model structures have been proposed to reduce the computation complexity and some are really effective. The previous research can be divided into two categories. One is to use more effective training and inference strategies and the other is focused on how to replace the standard self-attention mechanism with linear attention method. Differently, we revisit the design in Transformer and find that the feed forward network (FFN) is also computationally expensive, especially when the hidden dimension is large. In this paper, we propose a new FFN structure, named MSCFFN, which splits the large matrix space to several small space to reduce the computation complexity and uses the Multi-Space Cross method to ensure the accurate result. To the best of our knowledge, this is the first time to redesign FFN to accelerate Transformers. We experimentally validate the effectiveness of the proposed method on the Long-Range Arena benchmark. And the results show MSCFFN can achieve a faster speed with a similar or even better accuracy.

pdf bib
Contrastive Pre-training for Personalized Expert Finding
Qiyao Peng | Hongtao Liu | Zhepeng Lv | Qing Yang | Wenjun Wang
Findings of the Association for Computational Linguistics: EMNLP 2023

Expert finding could help route questions to potential suitable users to answer in Community Question Answering (CQA) platforms. Hence it is essential to learn accurate representations of experts and questions according to the question text articles. Recently the pre-training and fine-tuning paradigms are powerful for natural language understanding, which has the potential for better question modeling and expert finding. Inspired by this, we propose a CQA-domain Contrastive Pre-training framework for Expert Finding, named CPEF, which could learn more comprehensive question representations. Specifically, considering that there is semantic complementation between question titles and bodies, during the domain pre-training phase, we propose a title-body contrastive learning task to enhance question representations, which directly treats the question title and the corresponding body as positive samples of each other, instead of designing extra data-augmentation strategies. Furthermore, a personalized tuning network is proposed to inject the personalized preferences of different experts during the fine-tuning phase. Extensive experimental results on six real-world datasets demonstrate that our method could achieve superior performance for expert finding.

2022

pdf bib
TranS: Transition-based Knowledge Graph Embedding with Synthetic Relation Representation
Xuanyu Zhang | Qing Yang | Dongliang Xu
Findings of the Association for Computational Linguistics: EMNLP 2022

Knowledge graph embedding (KGE) aims to learn continuous vector representations of relations and entities in knowledge graph (KG). Recently, transition-based KGE methods have become popular and achieved promising performance. However, scoring patterns like TransE are not suitable for complex scenarios where the same entity pair has different relations. Although some models attempt to employ entity-relation interaction or projection to improve entity representation for one-to-many/many-to-one/many-to-many complex relations, they still continue the traditional scoring pattern, where only a single relation vector in the relation part is used to translate the head entity to the tail entity or their variants. And recent research shows that entity representation only needs to consider entities and their interactions to achieve better performance. Thus, in this paper, we propose a novel transition-based method, TranS, for KGE. The single relation vector of the relation part in the traditional scoring pattern is replaced by the synthetic relation representation with entity-relation interactions to solve these issues. And the entity part still retains its independence through entity-entity interactions. Experiments on a large KG dataset, ogbl-wikikg2, show that our model achieves state-of-the-art results.

pdf bib
Instance-Guided Prompt Learning for Few-Shot Text Matching
Jia Du | Xuanyu Zhang | Siyi Wang | Kai Wang | Yanquan Zhou | Lei Li | Qing Yang | Dongliang Xu
Findings of the Association for Computational Linguistics: EMNLP 2022

Few-shot text matching is a more practical technique in natural language processing (NLP) to determine whether two texts are semantically identical. They primarily design patterns to reformulate text matching into a pre-trained task with uniform prompts across all instances. But they fail to take into account the connection between prompts and instances. This paper argues that dynamically strengthening the correlation between particular instances and the prompts is necessary because fixed prompts cannot adequately fit all diverse instances in inference. We suggest IGATE: Instance-Guided prompt leArning for few-shoT tExt matching, a novel pluggable prompt learning method. The gate mechanism used by IGATE, which is between the embedding and the PLM encoders, makes use of the semantics of instances to regulate the effects of the gate on the prompt tokens. The experimental findings show that IGATE achieves SOTA performance on MRPC and QQP, outperforming strong baselines. GitHub will host the release of codes.