Deng Cai


2021

pdf bib
Dynamic Semantic Graph Construction and Reasoning for Explainable Multi-hop Science Question Answering
Weiwen Xu | Huihui Zhang | Deng Cai | Wai Lam
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Assessing Dialogue Systems with Distribution Distances
Jiannan Xiang | Yahui Liu | Deng Cai | Huayang Li | Defu Lian | Lemao Liu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Exploiting Reasoning Chains for Multi-hop Science Question Answering
Weiwen Xu | Yang Deng | Huihui Zhang | Deng Cai | Wai Lam
Findings of the Association for Computational Linguistics: EMNLP 2021

We propose a novel Chain Guided Retriever-reader (CGR) framework to model the reasoning chain for multi-hop Science Question Answering. Our framework is capable of performing explainable reasoning without the need of any corpus-specific annotations, such as the ground-truth reasoning chain, or human-annotated entity mentions. Specifically, we first generate reasoning chains from a semantic graph constructed by Abstract Meaning Representation of retrieved evidence facts. A Chain-aware loss, concerning both local and global chain information, is also designed to enable the generated chains to serve as distant supervision signals for training the retriever, where reinforcement learning is also adopted to maximize the utility of the reasoning chains. Our framework allows the retriever to capture step-by-step clues of the entire reasoning process, which is not only shown to be effective on two challenging multi-hop Science QA tasks, namely OpenBookQA and ARC-Challenge, but also favors explainability.

pdf bib
Multilingual AMR Parsing with Noisy Knowledge Distillation
Deng Cai | Xin Li | Jackie Chun-Sing Ho | Lidong Bing | Wai Lam
Findings of the Association for Computational Linguistics: EMNLP 2021

We study multilingual AMR parsing from the perspective of knowledge distillation, where the aim is to learn and improve a multilingual AMR parser by using an existing English parser as its teacher. We constrain our exploration in a strict multilingual setting: there is but one model to parse all different languages including English. We identify that noisy input and precise output are the key to successful distillation. Together with extensive pre-training, we obtain an AMR parser whose performances surpass all previously published results on four different foreign languages, including German, Spanish, Italian, and Chinese, by large margins (up to 18.8 Smatch points on Chinese and on average 11.3 Smatch points). Our parser also achieves comparable performance on English to the latest state-of-the-art English-only parser.

pdf bib
Non-Autoregressive Text Generation with Pre-trained Language Models
Yixuan Su | Deng Cai | Yan Wang | David Vandyke | Simon Baker | Piji Li | Nigel Collier
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Non-autoregressive generation (NAG) has recently attracted great attention due to its fast inference speed. However, the generation quality of existing NAG models still lags behind their autoregressive counterparts. In this work, we show that BERT can be employed as the backbone of a NAG model for a greatly improved performance. Additionally, we devise two mechanisms to alleviate the two common problems of vanilla NAG models: the inflexibility of prefixed output length and the conditional independence of individual token predictions. To further strengthen the speed advantage of the proposed model, we propose a new decoding strategy, ratio-first, for applications where the output lengths can be approximately estimated beforehand. For a comprehensive evaluation, we test the proposed model on three text generation tasks, including text summarization, sentence compression and machine translation. Experimental results show that our model significantly outperforms existing non-autoregressive baselines and achieves competitive performance with many strong autoregressive models. In addition, we also conduct extensive analysis experiments to reveal the effect of each proposed component.

pdf bib
Dialogue Response Selection with Hierarchical Curriculum Learning
Yixuan Su | Deng Cai | Qingyu Zhou | Zibo Lin | Simon Baker | Yunbo Cao | Shuming Shi | Nigel Collier | Yan Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We study the learning of a matching model for dialogue response selection. Motivated by the recent finding that models trained with random negative samples are not ideal in real-world scenarios, we propose a hierarchical curriculum learning framework that trains the matching model in an “easy-to-difficult” scheme. Our learning framework consists of two complementary curricula: (1) corpus-level curriculum (CC); and (2) instance-level curriculum (IC). In CC, the model gradually increases its ability in finding the matching clues between the dialogue context and a response candidate. As for IC, it progressively strengthens the model’s ability in identifying the mismatching information between the dialogue context and a response candidate. Empirical studies on three benchmark datasets with three state-of-the-art matching models demonstrate that the proposed learning framework significantly improves the model performance across various evaluation metrics.

pdf bib
Neural Machine Translation with Monolingual Translation Memory
Deng Cai | Yan Wang | Huayang Li | Wai Lam | Lemao Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Prior work has proved that Translation Memory (TM) can boost the performance of Neural Machine Translation (NMT). In contrast to existing work that uses bilingual corpus as TM and employs source-side similarity search for memory retrieval, we propose a new framework that uses monolingual memory and performs learnable memory retrieval in a cross-lingual manner. Our framework has unique advantages. First, the cross-lingual memory retriever allows abundant monolingual data to be TM. Second, the memory retriever and NMT model can be jointly optimized for the ultimate translation goal. Experiments show that the proposed method obtains substantial improvements. Remarkably, it even outperforms strong TM-augmented NMT baselines using bilingual TM. Owning to the ability to leverage monolingual data, our model also demonstrates effectiveness in low-resource and domain adaptation scenarios.

2020

pdf bib
AMR Parsing via Graph-Sequence Iterative Inference
Deng Cai | Wai Lam
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose a new end-to-end model that treats AMR parsing as a series of dual decisions on the input sequence and the incrementally constructed graph. At each time step, our model performs multiple rounds of attention, reasoning, and composition that aim to answer two critical questions: (1) which part of the input sequence to abstract; and (2) where in the output graph to construct the new concept. We show that the answers to these two questions are mutually causalities. We design a model based on iterative inference that helps achieve better answers in both perspectives, leading to greatly improved parsing accuracy. Our experimental results significantly outperform all previously reported Smatch scores by large margins. Remarkably, without the help of any large-scale pre-trained language model (e.g., BERT), our model already surpasses previous state-of-the-art using BERT. With the help of BERT, we can push the state-of-the-art results to 80.2% on LDC2017T10 (AMR 2.0) and 75.4% on LDC2014T12 (AMR 1.0).

pdf bib
The World is Not Binary: Learning to Rank with Grayscale Data for Dialogue Response Selection
Zibo Lin | Deng Cai | Yan Wang | Xiaojiang Liu | Haitao Zheng | Shuming Shi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Response selection plays a vital role in building retrieval-based conversation systems. Despite that response selection is naturally a learning-to-rank problem, most prior works take a point-wise view and train binary classifiers for this task: each response candidate is labeled either relevant (one) or irrelevant (zero). On the one hand, this formalization can be sub-optimal due to its ignorance of the diversity of response quality. On the other hand, annotating grayscale data for learning-to-rank can be prohibitively expensive and challenging. In this work, we show that grayscale data can be automatically constructed without human effort. Our method employs off-the-shelf response retrieval models and response generation models as automatic grayscale data generators. With the constructed grayscale data, we propose multi-level ranking objectives for training, which can (1) teach a matching model to capture more fine-grained context-response relevance difference and (2) reduce the train-test discrepancy in terms of distractor strength. Our method is simple, effective, and universal. Experiments on three benchmark datasets and four state-of-the-art matching models show that the proposed approach brings significant and consistent performance improvements.

2019

pdf bib
Retrieval-guided Dialogue Response Generation via a Matching-to-Generation Framework
Deng Cai | Yan Wang | Wei Bi | Zhaopeng Tu | Xiaojiang Liu | Shuming Shi
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

End-to-end sequence generation is a popular technique for developing open domain dialogue systems, though they suffer from the safe response problem. Researchers have attempted to tackle this problem by incorporating generative models with the returns of retrieval systems. Recently, a skeleton-then-response framework has been shown promising results for this task. Nevertheless, how to precisely extract a skeleton and how to effectively train a retrieval-guided response generator are still challenging. This paper presents a novel framework in which the skeleton extraction is made by an interpretable matching model and the following skeleton-guided response generation is accomplished by a separately trained generator. Extensive experiments demonstrate the effectiveness of our model designs.

pdf bib
Core Semantic First: A Top-down Approach for AMR Parsing
Deng Cai | Wai Lam
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We introduce a novel scheme for parsing a piece of text into its Abstract Meaning Representation (AMR): Graph Spanning based Parsing (GSP). One novel characteristic of GSP is that it constructs a parse graph incrementally in a top-down fashion. Starting from the root, at each step, a new node and its connections to existing nodes will be jointly predicted. The output graph spans the nodes by the distance to the root, following the intuition of first grasping the main ideas then digging into more details. The core semantic first principle emphasizes capturing the main ideas of a sentence, which is of great interest. We evaluate our model on the latest AMR sembank and achieve the state-of-the-art performance in the sense that no heuristic graph re-categorization is adopted. More importantly, the experiments show that our parser is especially good at obtaining the core semantics.

pdf bib
Charge-Based Prison Term Prediction with Deep Gating Network
Huajie Chen | Deng Cai | Wei Dai | Zehui Dai | Yadong Ding
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Judgment prediction for legal cases has attracted much research efforts for its practice use, of which the ultimate goal is prison term prediction. While existing work merely predicts the total prison term, in reality a defendant is often charged with multiple crimes. In this paper, we argue that charge-based prison term prediction (CPTP) not only better fits realistic needs, but also makes the total prison term prediction more accurate and interpretable. We collect the first large-scale structured data for CPTP and evaluate several competitive baselines. Based on the observation that fine-grained feature selection is the key to achieving good performance, we propose the Deep Gating Network (DGN) for charge-specific feature selection and aggregation. Experiments show that DGN achieves the state-of-the-art performance.

pdf bib
Reinforced Dynamic Reasoning for Conversational Question Generation
Boyuan Pan | Hao Li | Ziyu Yao | Deng Cai | Huan Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper investigates a new task named Conversational Question Generation (CQG) which is to generate a question based on a passage and a conversation history (i.e., previous turns of question-answer pairs). CQG is a crucial task for developing intelligent agents that can drive question-answering style conversations or test user understanding of a given passage. Towards that end, we propose a new approach named Reinforced Dynamic Reasoning network, which is based on the general encoder-decoder framework but incorporates a reasoning procedure in a dynamic manner to better understand what has been asked and what to ask next about the passage into the general encoder-decoder framework. To encourage producing meaningful questions, we leverage a popular question answering (QA) model to provide feedback and fine-tune the question generator using a reinforcement learning mechanism. Empirical results on the recently released CoQA dataset demonstrate the effectiveness of our method in comparison with various baselines and model variants. Moreover, to show the applicability of our method, we also apply it to create multi-turn question-answering conversations for passages in SQuAD.

pdf bib
Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory
Deng Cai | Yan Wang | Wei Bi | Zhaopeng Tu | Xiaojiang Liu | Wai Lam | Shuming Shi
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Traditional generative dialogue models generate responses solely from input queries. Such information is insufficient for generating a specific response since a certain query could be answered in multiple ways. Recently, researchers have attempted to fill the information gap by exploiting information retrieval techniques. For a given query, similar dialogues are retrieved from the entire training data and considered as an additional knowledge source. While the use of retrieval may harvest extensive information, the generative models could be overwhelmed, leading to unsatisfactory performance. In this paper, we propose a new framework which exploits retrieval results via a skeleton-to-response paradigm. At first, a skeleton is extracted from the retrieved dialogues. Then, both the generated skeleton and the original query are used for response generation via a novel response generator. Experimental results show that our approach significantly improves the informativeness of the generated responses

2018

pdf bib
Translating a Math Word Problem to a Expression Tree
Lei Wang | Yan Wang | Deng Cai | Dongxiang Zhang | Xiaojiang Liu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Sequence-to-sequence (SEQ2SEQ) models have been successfully applied to automatic math word problem solving. Despite its simplicity, a drawback still remains: a math word problem can be correctly solved by more than one equations. This non-deterministic transduction harms the performance of maximum likelihood estimation. In this paper, by considering the uniqueness of expression tree, we propose an equation normalization method to normalize the duplicated equations. Moreover, we analyze the performance of three popular SEQ2SEQ models on the math word problem solving. We find that each model has its own specialty in solving problems, consequently an ensemble model is then proposed to combine their advantages. Experiments on dataset Math23K show that the ensemble model with equation normalization significantly outperforms the previous state-of-the-art methods.

pdf bib
Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference
Boyuan Pan | Yazheng Yang | Zhou Zhao | Yueting Zhuang | Deng Cai | Xiaofei He
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Natural Language Inference (NLI), also known as Recognizing Textual Entailment (RTE), is one of the most important problems in natural language processing. It requires to infer the logical relationship between two given sentences. While current approaches mostly focus on the interaction architectures of the sentences, in this paper, we propose to transfer knowledge from some important discourse markers to augment the quality of the NLI model. We observe that people usually use some discourse markers such as “so” or “but” to represent the logical relationship between two sentences. These words potentially have deep connections with the meanings of the sentences, thus can be utilized to help improve the representations of them. Moreover, we use reinforcement learning to optimize a new objective function with a reward defined by the property of the NLI datasets to make full use of the labels information. Experiments show that our method achieves the state-of-the-art performance on several large-scale datasets.

2017

pdf bib
Fast and Accurate Neural Word Segmentation for Chinese
Deng Cai | Hai Zhao | Zhisong Zhang | Yuan Xin | Yongjian Wu | Feiyue Huang
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation. However, both training and working procedures of the current neural models are computationally inefficient. In this paper, we propose a greedy neural word segmenter with balanced word and character embedding inputs to alleviate the existing drawbacks. Our segmenter is truly end-to-end, capable of performing segmentation much faster and even more accurate than state-of-the-art neural models on Chinese benchmark datasets.

2016

pdf bib
Neural Word Segmentation Learning for Chinese
Deng Cai | Hai Zhao
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)