Ji-Rong Wen


2022

pdf bib
MISC: A Mixed Strategy-Aware Model integrating COMET for Emotional Support Conversation
Quan Tu | Yanran Li | Jianwei Cui | Bin Wang | Ji-Rong Wen | Rui Yan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Applying existing methods to emotional support conversation—which provides valuable assistance to people who are in need—has two major limitations: (a) they generally employ a conversation-level emotion label, which is too coarse-grained to capture user’s instant mental state; (b) most of them focus on expressing empathy in the response(s) rather than gradually reducing user’s distress. To address the problems, we propose a novel model MISC, which firstly infers the user’s fine-grained emotional status, and then responds skillfully using a mixture of strategy. Experimental results on the benchmark dataset demonstrate the effectiveness of our method and reveal the benefits of fine-grained emotion understanding as well as mixed-up strategy modeling.

pdf bib
There Are a Thousand Hamlets in a Thousand People’s Eyes: Enhancing Knowledge-grounded Dialogue with Personal Memory
Tingchen Fu | Xueliang Zhao | Chongyang Tao | Ji-Rong Wen | Rui Yan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge-grounded conversation (KGC) shows great potential in building an engaging and knowledgeable chatbot, and knowledge selection is a key ingredient in it. However, previous methods for knowledge selection only concentrate on the relevance between knowledge and dialogue context, ignoring the fact that age, hobby, education and life experience of an interlocutor have a major effect on his or her personal preference over external knowledge. Without taking the personalization issue into account, it is difficult for existing dialogue systems to select the proper knowledge and generate persona-consistent responses.In this work, we introduce personal memory into knowledge selection in KGC to address the personalization issue. We propose a variational method to model the underlying relationship between one’s personal memory and his or her selection of knowledge, and devise a learning scheme in which the forward mapping from personal memory to knowledge and its inverse mapping is included in a closed loop so that they could teach each other. Experiment results show that our methods outperform existing KGC methods significantly on both automatic evaluation and human evaluation.

pdf bib
Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network
Zheng Gong | Kun Zhou | Xin Zhao | Jing Sha | Shijin Wang | Ji-Rong Wen
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we study how to continually pre-train language models for improving the understanding of math problems. Specifically, we focus on solving a fundamental challenge in modeling math problems, how to fuse the semantics of textual description and formulas, which are highly different in essence. To address this issue, we propose a new approach called COMUS to continually pre-train language models for math problem understanding with syntax-aware memory network. In this approach, we first construct the math syntax graph to model the structural semantic information, by combining the parsing trees of the text and formulas, and then design the syntax-aware memory networks to deeply fuse the features from the graph and text. With the help of syntax relations, we can model the interaction between the token from the text and its semantic-related nodes within the formulas, which is helpful to capture fine-grained semantic correlations between texts and formulas. Besides, we devise three continual pre-training tasks to further align and fuse the representations of the text and math syntax graph. Experimental results on four tasks in the math domain demonstrate the effectiveness of our approach. Our code and data are publicly available at the link: bluehttps://github.com/RUCAIBox/COMUS.

pdf bib
Debiased Contrastive Learning of Unsupervised Sentence Representations
Kun Zhou | Beichen Zhang | Xin Zhao | Ji-Rong Wen
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recently, contrastive learning has been shown to be effective in improving pre-trained language models (PLM) to derive high-quality sentence representations. It aims to pull close positive examples to enhance the alignment while push apart irrelevant negatives for the uniformity of the whole representation space.However, previous works mostly adopt in-batch negatives or sample from training data at random. Such a way may cause the sampling bias that improper negatives (false negatives and anisotropy representations) are used to learn sentence representations, which will hurt the uniformity of the representation space.To address it, we present a new framework DCLR (Debiased Contrastive Learning of unsupervised sentence Representations) to alleviate the influence of these improper negatives.In DCLR, we design an instance weighting method to punish false negatives and generate noise-based negatives to guarantee the uniformity of the representation space.Experiments on seven semantic textual similarity tasks show that our approach is more effective than competitive baselines. Our code and data are publicly available at the link: bluehttps://github.com/RUCAIBox/DCLR.

pdf bib
Learning to Transfer Prompts for Text Generation
Junyi Li | Tianyi Tang | Jian-Yun Nie | Ji-Rong Wen | Xin Zhao
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Pretrained language models (PLMs) have made remarkable progress in text generation tasks via fine-tuning. While, it is challenging to fine-tune PLMs in a data-scarce situation. Therefore, it is non-trivial to develop a general and lightweight model that can adapt to various text generation tasks based on PLMs. To fulfill this purpose, the recent prompt-based learning offers a potential solution. In this paper, we improve this technique and propose a novel prompt-based method (PTG) for text generation in a transferable setting. First, PTG learns a set of source prompts for various source generation tasks and then transfers these prompts as target prompts to perform target generation tasks. To consider both task- and instance-level information, we design an adaptive attention mechanism to derive the target prompts. For each data instance, PTG learns a specific target prompt by attending to highly relevant source prompts. In extensive experiments, PTG yields competitive or better results than fine-tuning methods. We release our source prompts as an open resource, where users can add or reuse them to improve new text generation tasks for future research. Code and data can be available at https://github.com/RUCAIBox/Transfer-Prompts-for-Text-Generation.

pdf bib
ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models
Junyi Li | Tianyi Tang | Zheng Gong | Lixin Yang | Zhuohao Yu | Zhipeng Chen | Jingyuan Wang | Xin Zhao | Ji-Rong Wen
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Nowadays, pretrained language models (PLMs) have dominated the majority of NLP tasks. While, little research has been conducted on systematically evaluating the language abilities of PLMs. In this paper, we present a large-scale empirical study on general language ability evaluation of PLMs (ElitePLM). In our study, we design four evaluation dimensions, memory, comprehension, reasoning, and composition, to measure ten widely-used PLMs within five categories. Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; (3) PLMs have excellent transferability between similar tasks. Moreover, the prediction results of PLMs in our experiments are released as an open resource for more deep and detailed analysis on the language abilities of PLMs. This paper can guide the future work to select, apply, and design PLMs for specific tasks. We have made all the details of experiments publicly available at https://github.com/RUCAIBox/ElitePLM.

pdf bib
Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation
Hanxun Zhong | Zhicheng Dou | Yutao Zhu | Hongjin Qian | Ji-Rong Wen
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Personalized dialogue systems explore the problem of generating responses that are consistent with the user’s personality, which has raised much attention in recent years. Existing personalized dialogue systems have tried to extract user profiles from dialogue history to guide personalized response generation. Since the dialogue history is usually long and noisy, most existing methods truncate the dialogue history to model the user’s personality. Such methods can generate some personalized responses, but a large part of dialogue history is wasted, leading to sub-optimal performance of personalized response generation. In this work, we propose to refine the user dialogue history on a large scale, based on which we can handle more dialogue history and obtain more abundant and accurate persona information. Specifically, we design an MSP model which consists of three personal information refiners and a personalized response generator. With these multi-level refiners, we can sparsely extract the most valuable information (tokens) from the dialogue history and leverage other similar users’ data to enhance personalization. Experimental results on two real-world datasets demonstrate the superiority of our model in generating more informative and personalized responses.

pdf bib
Finding the Dominant Winning Ticket in Pre-Trained Language Models
Zhuocheng Gong | Di He | Yelong Shen | Tie-Yan Liu | Weizhu Chen | Dongyan Zhao | Ji-Rong Wen | Rui Yan
Findings of the Association for Computational Linguistics: ACL 2022

The Lottery Ticket Hypothesis suggests that for any over-parameterized model, a small subnetwork exists to achieve competitive performance compared to the backbone architecture. In this paper, we study whether there is a winning lottery ticket for pre-trained language models, which allow the practitioners to fine-tune the parameters in the ticket but achieve good downstream performance. To achieve this, we regularize the fine-tuning process with L1 distance and explore the subnetwork structure (what we refer to as the “dominant winning ticket”). Empirically, we show that (a) the dominant winning ticket can achieve performance that is comparable with that of the full-parameter model, (b) the dominant winning ticket is transferable across different tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix. Strikingly, we find that a dominant winning ticket that takes up 0.05% of the parameters can already achieve satisfactory performance, indicating that the PLM is significantly reducible during fine-tuning.

pdf bib
Great~Truths~are ~Always ~Simple: A Rather Simple Knowledge Encoder for Enhancing the Commonsense Reasoning Capacity of Pre-Trained Models
Jinhao Jiang | Kun Zhou | Ji-Rong Wen | Xin Zhao
Findings of the Association for Computational Linguistics: NAACL 2022

Commonsense reasoning in natural language is a desired ability of artificial intelligent systems. For solving complex commonsense reasoning tasks, a typical solution is to enhance pre-trained language models (PTMs) with a knowledge-aware graph neural network (GNN) encoder that models a commonsense knowledge graph (CSKG).Despite the effectiveness, these approaches are built on heavy architectures, and can’t clearly explain how external knowledge resources improve the reasoning capacity of PTMs. Considering this issue, we conduct a deep empirical analysis, and find that it is indeed relation features from CSKGs (but not node features) that mainly contribute to the performance improvement of PTMs. Based on this finding, we design a simple MLP-based knowledge encoder that utilizes statistical relation paths as features. Extensive experiments conducted on five benchmarks demonstrate the effectiveness of our approach, which also largely reduces the parameters for encoding CSKGs.Our codes and data are publicly available at https://github.com/RUCAIBox/SAFE.

pdf bib
Semantic Sentence Matching via Interacting Syntax Graphs
Chen Xu | Jun Xu | Zhenhua Dong | Ji-Rong Wen
Proceedings of the 29th International Conference on Computational Linguistics

Studies have shown that the sentence’s syntactic structures are important for semantic sentence matching. A typical approach is encoding each sentence’s syntactic structure into an embedding vector, which can be combined with other features to predict the final matching scores. Though successes have been observed, embedding the whole syntactic structures as one vector inevitably overlooks the fine-grained syntax matching patterns, e.g. the alignment of specific term dependencies relations in the two inputted sentences. In this paper, we formalize the task of semantic sentence matching as a problem of graph matching in which each sentence is represented as a directed graph according to its syntactic structures. The syntax matching patterns (i.e. similar syntactic structures) between two sentences, therefore, can be extracted as the sub-graph structure alignments. The proposed method, referred to as Interacted Syntax Graphs (ISG), represents two sentences’ syntactic alignments as well as their semantic matching signals into one association graph. After that, the neural quadratic assignment programming (QAP) is adapted to extract syntactic matching patterns from the association graph. In this way, the syntactic structures fully interact in a fine granularity during the matching process. Experimental results on three public datasets demonstrated that ISG can outperform the state-of-the-art baselines effectively and efficiently. The empirical analysis also showed that ISG can match sentences in an interpretable way.

pdf bib
Optimal Partial Transport Based Sentence Selection for Long-form Document Matching
Weijie Yu | Liang Pang | Jun Xu | Bing Su | Zhenhua Dong | Ji-Rong Wen
Proceedings of the 29th International Conference on Computational Linguistics

One typical approach to long-form document matching is first conducting alignment between cross-document sentence pairs, and then aggregating all of the sentence-level matching signals. However, this approach could be problematic because the alignment between documents is partial — despite two documents as a whole are well-matched, most of the sentences could still be dissimilar. Those dissimilar sentences lead to spurious sentence-level matching signals which may overwhelm the real ones, increasing the difficulties of learning the matching function. Therefore, accurately selecting the key sentences for document matching is becoming a challenging issue. To address the issue, we propose a novel matching approach that equips existing document matching models with an Optimal Partial Transport (OPT) based component, namely OPT-Match, which selects the sentences that play a major role in matching. Enjoying the partial transport properties of OPT, the selected key sentences can not only effectively enhance the matching accuracy, but also be explained as the rationales for the matching results. Extensive experiments on four publicly available datasets demonstrated that existing methods equipped with OPT-Match consistently outperformed the corresponding underlying methods. Evaluations also showed that the key sentences selected by OPT-Match were consistent with human-provided rationales.

pdf bib
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models
Ze-Feng Gao | Peiyu Liu | Wayne Xin Zhao | Zhong-Yi Lu | Ji-Rong Wen
Proceedings of the 29th International Conference on Computational Linguistics

Recently, Mixture-of-Experts (short as MoE) architecture has achieved remarkable success in increasing the model capacity of large-scale language models. However, MoE requires incorporating significantly more parameters than the base model being extended. In this paper, we propose building a parameter-efficient MoE architecture by sharing information across experts. We adopt matrix product operator (MPO, a tensor decomposition from quantum many-body physics) to reconstruct the parameter matrix in the expert layer and increase model capacity for pre-trained language models by sharing parameters of the central tensor (containing the core information) among different experts while enabling the specificity through the auxiliary tensors (complementing the central tensor) of different experts. To address the unbalanced optimization issue, we further design the gradient mask strategy for the MPO-based MoE architecture. Extensive experiments based on T5 and GPT-2 show improved performance and efficiency of the pre-trained language model (27.2x reduction in total parameters for the superior model performance, compared with the Switch Transformers). Our code is publicly available at https://github.com/RUCAIBox/MPO/MPOE.

pdf bib
Context-Tuning: Learning Contextualized Prompts for Natural Language Generation
Tianyi Tang | Junyi Li | Wayne Xin Zhao | Ji-Rong Wen
Proceedings of the 29th International Conference on Computational Linguistics

Recently, pretrained language models (PLMs) have had exceptional success in language generation. To leverage the rich knowledge encoded by PLMs, a simple yet powerful paradigm is to use prompts in the form of either discrete tokens or continuous embeddings. In existing studies, these prompting methods are typically independent of the inputs, lacking sufficient consideration of input semantics. To address this issue, we propose a novel continuous prompting approach, called context-tuning, to fine-tuning PLMs for natural language generation. Firstly, the prompts are derived based on the input text to elicit useful knowledge from PLMs for generation. We refer to such prompts as contextualized prompts. Secondly, we use continuous inverse prompting to improve the process of natural language generation by modeling an inverse generation process from output to input, making the generated text more relevant to the inputs. Furthermore, we utilize a lightweight context-tuning method that fine-tunes only 0.12% of the parameters while maintaining good performance. Our code is publicly available at https://github.com/RUCAIBox/Context-Tuning.

2021

pdf bib
A Joint Model for Dropped Pronoun Recovery and Conversational Discourse Parsing in Chinese Conversational Speech
Jingxuan Yang | Kerui Xu | Jun Xu | Si Li | Sheng Gao | Jun Guo | Nianwen Xue | Ji-Rong Wen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper, we present a neural model for joint dropped pronoun recovery (DPR) and conversational discourse parsing (CDP) in Chinese conversational speech. We show that DPR and CDP are closely related, and a joint model benefits both tasks. We refer to our model as DiscProReco, and it first encodes the tokens in each utterance in a conversation with a directed Graph Convolutional Network (GCN). The token states for an utterance are then aggregated to produce a single state for each utterance. The utterance states are then fed into a biaffine classifier to construct a conversational discourse graph. A second (multi-relational) GCN is then applied to the utterance states to produce a discourse relation-augmented representation for the utterances, which are then fused together with token states in each utterance as input to a dropped pronoun recovery layer. The joint model is trained and evaluated on a new Structure Parsing-enhanced Dropped Pronoun Recovery (SPDPR) data set that we annotated with both two types of information. Experimental results on the SPDPR dataset and other benchmarks show that DiscProReco significantly outperforms the state-of-the-art baselines of both tasks.

pdf bib
A Pre-training Strategy for Zero-Resource Response Selection in Knowledge-Grounded Conversations
Chongyang Tao | Changyu Chen | Jiazhan Feng | Ji-Rong Wen | Rui Yan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recently, many studies are emerging towards building a retrieval-based dialogue system that is able to effectively leverage background knowledge (e.g., documents) when conversing with humans. However, it is non-trivial to collect large-scale dialogues that are naturally grounded on the background documents, which hinders the effective and adequate training of knowledge selection and response matching. To overcome the challenge, we consider decomposing the training of the knowledge-grounded response selection into three tasks including: 1) query-passage matching task; 2) query-dialogue history matching task; 3) multi-turn response matching task, and joint learning all these tasks in a unified pre-trained language model. The former two tasks could help the model in knowledge selection and comprehension, while the last task is designed for matching the proper response with the given query and background knowledge (dialogue history). By this means, the model can be learned to select relevant knowledge and distinguish proper response, with the help of ad-hoc retrieval corpora and a large number of ungrounded multi-turn dialogues. Experimental results on two benchmarks of knowledge-grounded response selection indicate that our model can achieve comparable performance with several existing methods that rely on crowd-sourced data for training.

pdf bib
Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators
Peiyu Liu | Ze-Feng Gao | Wayne Xin Zhao | Zhi-Yuan Xie | Zhong-Yi Lu | Ji-Rong Wen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper presents a novel pre-trained language models (PLM) compression approach based on the matrix product operator (short as MPO) from quantum many-body physics. It can decompose an original matrix into central tensors (containing the core information) and auxiliary tensors (with only a small proportion of parameters). With the decomposed MPO structure, we propose a novel fine-tuning strategy by only updating the parameters from the auxiliary tensors, and design an optimization algorithm for MPO-based approximation over stacked network architectures. Our approach can be applied to the original or the compressed PLMs in a general way, which derives a lighter network and significantly reduces the parameters to be fine-tuned. Extensive experiments have demonstrated the effectiveness of the proposed approach in model compression, especially the reduction in fine-tuning parameters (91% reduction on average). The code to reproduce the results of this paper can be found at https://github.com/RUCAIBox/MPOP.

pdf bib
TextBox: A Unified, Modularized, and Extensible Framework for Text Generation
Junyi Li | Tianyi Tang | Gaole He | Jinhao Jiang | Xiaoxuan Hu | Puzhao Xie | Zhipeng Chen | Zhuohao Yu | Wayne Xin Zhao | Ji-Rong Wen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

In this paper, we release an open-source library, called TextBox, to provide a unified, modularized, and extensible text generation framework. TextBox aims to support a broad set of text generation tasks and models. In our library, we implement 21 text generation models on 9 benchmark datasets, covering the categories of VAE, GAN, and pretrained language models. Meanwhile, our library maintains sufficient modularity and extensibility by properly decomposing the model architecture, inference, and learning process into highly reusable modules, which allows users to easily incorporate new models into our framework. The above features make TextBox especially suitable for researchers and practitioners to quickly reproduce baseline models and develop new models. TextBox is implemented based on PyTorch, and released under Apache License 2.0 at the link https://github.com/RUCAIBox/TextBox.

pdf bib
CRSLab: An Open-Source Toolkit for Building Conversational Recommender System
Kun Zhou | Xiaolei Wang | Yuanhang Zhou | Chenzhan Shang | Yuan Cheng | Wayne Xin Zhao | Yaliang Li | Ji-Rong Wen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

In recent years, conversational recommender systems (CRSs) have drawn a wide attention in the research community, which focus on providing high-quality recommendations to users via natural language conversations. However, due to diverse scenarios and data formats, existing studies on CRSs lack unified and standardized implementation or comparison. To tackle this challenge, we release an open-source toolkit CRSLab, which provides a unified and extensible framework with highly-decoupled modules to develop CRSs. Based on this framework, we collect 6 commonly used human-annotated CRS datasets and implement 19 models that include advanced techniques such as graph neural networks and pre-training models. Besides, our toolkit provides a series of automatic evaluation protocols and a human-machine interaction interface to evaluate and compare different CRS methods. The project and documents are released at https://github.com/RUCAIBox/CRSLab.

pdf bib
RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
Ruiyang Ren | Yingqi Qu | Jing Liu | Wayne Xin Zhao | QiaoQiao She | Hua Wu | Haifeng Wang | Ji-Rong Wen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In various natural language processing tasks, passage retrieval and passage re-ranking are two key procedures in finding and ranking relevant information. Since both the two procedures contribute to the final performance, it is important to jointly optimize them in order to achieve mutual improvement. In this paper, we propose a novel joint training approach for dense passage retrieval and passage reranking. A major contribution is that we introduce the dynamic listwise distillation, where we design a unified listwise training approach for both the retriever and the re-ranker. During the dynamic distillation, the retriever and the re-ranker can be adaptively improved according to each other’s relevance information. We also propose a hybrid data augmentation strategy to construct diverse training instances for listwise training approach. Extensive experiments show the effectiveness of our approach on both MSMARCO and Natural Questions datasets. Our code is available at https://github.com/PaddlePaddle/RocketQA.

pdf bib
Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models
Kun Zhou | Wayne Xin Zhao | Sirui Wang | Fuzheng Zhang | Wei Wu | Ji-Rong Wen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent works have shown that powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks. To solve this issue, various data augmentation techniques are proposed to improve the robustness of PLMs. However, it is still challenging to augment semantically relevant examples with sufficient diversity. In this work, we present Virtual Data Augmentation (VDA), a general framework for robustly fine-tuning PLMs. Based on the original token embeddings, we construct a multinomial mixture for augmenting virtual data embeddings, where a masked language model guarantees the semantic relevance and the Gaussian noise provides the augmentation diversity. Furthermore, a regularized training strategy is proposed to balance the two aspects. Extensive experiments on six datasets show that our approach is able to improve the robustness of PLMs and alleviate the performance degradation under adversarial attacks. Our codes and data are publicly available at bluehttps://github.com/RUCAIBox/VDA.

pdf bib
Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models
Junyi Li | Tianyi Tang | Wayne Xin Zhao | Zhicheng Wei | Nicholas Jing Yuan | Ji-Rong Wen
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval
Ruiyang Ren | Shangwen Lv | Yingqi Qu | Jing Liu | Wayne Xin Zhao | QiaoQiao She | Hua Wu | Haifeng Wang | Ji-Rong Wen
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Towards Topic-Guided Conversational Recommender System
Kun Zhou | Yuanhang Zhou | Wayne Xin Zhao | Xiaoke Wang | Ji-Rong Wen
Proceedings of the 28th International Conference on Computational Linguistics

Conversational recommender systems (CRS) aim to recommend high-quality items to users through interactive conversations. To develop an effective CRS, the support of high-quality datasets is essential. Existing CRS datasets mainly focus on immediate requests from users, while lack proactive guidance to the recommendation scenario. In this paper, we contribute a new CRS dataset named TG-ReDial (Recommendation through Topic-Guided Dialog). Our dataset has two major features. First, it incorporates topic threads to enforce natural semantic transitions towards the recommendation scenario. Second, it is created in a semi-automatic way, hence human annotation is more reasonable and controllable. Based on TG-ReDial, we present the task of topic-guided conversational recommendation, and propose an effective approach to this task. Extensive experiments have demonstrated the effectiveness of our approach on three sub-tasks, namely topic prediction, item recommendation and response generation. TG-ReDial is available at bluehttps://github.com/RUCAIBox/TG-ReDial.

pdf bib
Wasserstein Distance Regularized Sequence Representation for Text Matching in Asymmetrical Domains
Weijie Yu | Chen Xu | Jun Xu | Liang Pang | Xiaopeng Gao | Xiaozhao Wang | Ji-Rong Wen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

One approach to matching texts from asymmetrical domains is projecting the input sequences into a common semantic space as feature vectors upon which the matching function can be readily defined and learned. In real-world matching practices, it is often observed that with the training goes on, the feature vectors projected from different domains tend to be indistinguishable. The phenomenon, however, is often overlooked in existing matching models. As a result, the feature vectors are constructed without any regularization, which inevitably increases the difficulty of learning the downstream matching functions. In this paper, we propose a novel match method tailored for text matching in asymmetrical domains, called WD-Match. In WD-Match, a Wasserstein distance-based regularizer is defined to regularize the features vectors projected from different domains. As a result, the method enforces the feature projection function to generate vectors such that those correspond to different domains cannot be easily discriminated. The training process of WD-Match amounts to a game that minimizes the matching loss regularized by the Wasserstein distance. WD-Match can be used to improve different text matching methods, by using the method as its underlying matching model. Four popular text matching methods have been exploited in the paper. Experimental results based on four publicly available benchmarks showed that WD-Match consistently outperformed the underlying methods and the baselines.

pdf bib
Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields
Jingxuan Yang | Kerui Xu | Jun Xu | Si Li | Sheng Gao | Jun Guo | Ji-Rong Wen | Nianwen Xue
Findings of the Association for Computational Linguistics: EMNLP 2020

Pronouns are often dropped in Chinese conversations and recovering the dropped pronouns is important for NLP applications such as Machine Translation. Existing approaches usually formulate this as a sequence labeling task of predicting whether there is a dropped pronoun before each token and its type. Each utterance is considered to be a sequence and labeled independently. Although these approaches have shown promise, labeling each utterance independently ignores the dependencies between pronouns in neighboring utterances. Modeling these dependencies is critical to improving the performance of dropped pronoun recovery. In this paper, we present a novel framework that combines the strength of Transformer network with General Conditional Random Fields (GCRF) to model the dependencies between pronouns in neighboring utterances. Results on three Chinese conversation datasets show that the Transformer-GCRF model outperforms the state-of-the-art dropped pronoun recovery models. Exploratory analysis also demonstrates that the GCRF did help to capture the dependencies between pronouns in neighboring utterances, thus contributes to the performance improvements.

2019

pdf bib
Generating Long and Informative Reviews with Aspect-Aware Coarse-to-Fine Decoding
Junyi Li | Wayne Xin Zhao | Ji-Rong Wen | Yang Song
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Generating long and informative review text is a challenging natural language generation task. Previous work focuses on word-level generation, neglecting the importance of topical and syntactic characteristics from natural languages. In this paper, we propose a novel review generation model by characterizing an elaborately designed aspect-aware coarse-to-fine generation process. First, we model the aspect transitions to capture the overall content flow. Then, to generate a sentence, an aspect-aware sketch will be predicted using an aspect-aware decoder. Finally, another decoder fills in the semantic slots by generating corresponding words. Our approach is able to jointly utilize aspect semantics, syntactic sketch, and context information. Extensive experiments results have demonstrated the effectiveness of the proposed model.

pdf bib
Domain Adaptation for Person-Job Fit with Transferable Deep Global Match Network
Shuqing Bian | Wayne Xin Zhao | Yang Song | Tao Zhang | Ji-Rong Wen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Person-job fit has been an important task which aims to automatically match job positions with suitable candidates. Previous methods mainly focus on solving the match task in single-domain setting, which may not work well when labeled data is limited. We study the domain adaptation problem for person-job fit. We first propose a deep global match network for capturing the global semantic interactions between two sentences from a job posting and a candidate resume respectively. Furthermore, we extend the match network and implement domain adaptation in three levels, sentence-level representation, sentence-level match, and global match. Extensive experiment results on a large real-world dataset consisting of six domains have demonstrated the effectiveness of the proposed model, especially when there is not sufficient labeled data.

pdf bib
A Neural Citation Count Prediction Model based on Peer Review Text
Siqing Li | Wayne Xin Zhao | Eddy Jing Yin | Ji-Rong Wen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Citation count prediction (CCP) has been an important research task for automatically estimating the future impact of a scholarly paper. Previous studies mainly focus on extracting or mining useful features from the paper itself or the associated authors. An important kind of data signals, peer review text, has not been utilized for the CCP task. In this paper, we take the initiative to utilize peer review data for the CCP task with a neural prediction model. Our focus is to learn a comprehensive semantic representation for peer review text for improving the prediction performance. To achieve this goal, we incorporate the abstract-review match mechanism and the cross-review match mechanism to learn deep features from peer review text. We also consider integrating hand-crafted features via a wide component. The deep and wide components jointly make the prediction. Extensive experiments have demonstrated the usefulness of the peer review data and the effectiveness of the proposed model. Our dataset has been released online.

2013

pdf bib
Improving Web Search Ranking by Incorporating Structured Annotation of Queries
Xiao Ding | Zhicheng Dou | Bing Qin | Ting Liu | Ji-Rong Wen
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches
Shuming Shi | Huibin Zhang | Xiaojie Yuan | Ji-Rong Wen
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib
Anchor Text Extraction for Academic Search
Shuming Shi | Fei Xing | Mingjie Zhu | Zaiqing Nie | Ji-Rong Wen
Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL)

pdf bib
Employing Topic Models for Pattern-based Semantic Class Discovery
Huibin Zhang | Mingjie Zhu | Shuming Shi | Ji-Rong Wen
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP