Lei Li


2021

pdf bib
Dynamic Knowledge Distillation for Pre-trained Language Models
Lei Li | Yankai Lin | Shuhuai Ren | Peng Li | Jie Zhou | Xu Sun
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Knowledge distillation (KD) has been proved effective for compressing large-scale pre-trained language models. However, existing methods conduct KD statically, e.g., the student model aligns its output distribution to that of a selected teacher model on the pre-defined training dataset. In this paper, we explore whether a dynamic knowledge distillation that empowers the student to adjust the learning procedure according to its competency, regarding the student performance and learning efficiency. We explore the dynamical adjustments on three aspects: teacher model adoption, data selection, and KD objective adaptation. Experimental results show that (1) proper selection of teacher model can boost the performance of student model; (2) conducting KD with 10% informative instances achieves comparable performance while greatly accelerates the training; (3) the student performance can be boosted by adjusting the supervision contribution of different alignment objective. We find dynamic knowledge distillation is promising and provide discussions on potential future directions towards more efficient KD methods.

pdf bib
Learning Logic Rules for Document-Level Relation Extraction
Dongyu Ru | Changzhi Sun | Jiangtao Feng | Lin Qiu | Hao Zhou | Weinan Zhang | Yong Yu | Lei Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Document-level relation extraction aims to identify relations between entities in a whole document. Prior efforts to capture long-range dependencies have relied heavily on implicitly powerful representations learned through (graph) neural networks, which makes the model less transparent. To tackle this challenge, in this paper, we propose LogiRE, a novel probabilistic model for document-level relation extraction by learning logic rules. LogiRE treats logic rules as latent variables and consists of two modules: a rule generator and a relation extractor. The rule generator is to generate logic rules potentially contributing to final predictions, and the relation extractor outputs final predictions based on the generated logic rules. Those two modules can be efficiently optimized with the expectation-maximization (EM) algorithm. By introducing logic rules into neural networks, LogiRE can explicitly capture long-range dependencies as well as enjoy better interpretation. Empirical results show that significantly outperforms several strong baselines in terms of relation performance and logical consistency. Our code is available at https://github.com/rudongyu/LogiRE.

pdf bib
Gradient-Based Adversarial Factual Consistency Evaluation for Abstractive Summarization
Zhiyuan Zeng | Jiaze Chen | Weiran Xu | Lei Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Neural abstractive summarization systems have gained significant progress in recent years. However, abstractive summarization often produce inconsisitent statements or false facts. How to automatically generate highly abstract yet factually correct summaries? In this paper, we proposed an efficient weak-supervised adversarial data augmentation approach to form the factual consistency dataset. Based on the artificial dataset, we train an evaluation model that can not only make accurate and robust factual consistency discrimination but is also capable of making interpretable factual errors tracing by backpropagated gradient distribution on token embeddings. Experiments and analysis conduct on public annotated summarization and factual consistency datasets demonstrate our approach effective and reasonable.

pdf bib
Learning Kernel-Smoothed Machine Translation with Retrieved Examples
Qingnan Jiang | Mingxuan Wang | Jun Cao | Shanbo Cheng | Shujian Huang | Lei Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

How to effectively adapt neural machine translation (NMT) models according to emerging cases without retraining? Despite the great success of neural machine translation, updating the deployed models online remains a challenge. Existing non-parametric approaches that retrieve similar examples from a database to guide the translation process are promising but are prone to overfit the retrieved examples. However, non-parametric methods are prone to overfit the retrieved examples. In this work, we propose to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online. Experiments on domain adaptation and multi-domain machine translation datasets show that even without expensive retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over the best existing online adaptation methods. The code and trained models are released at https://github.com/jiangqn/KSTER.

pdf bib
Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification
Shuhuai Ren | Jinchao Zhang | Lei Li | Xu Sun | Jie Zhou
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Data augmentation aims to enrich training samples for alleviating the overfitting issue in low-resource or class-imbalanced situations. Traditional methods first devise task-specific operations such as Synonym Substitute, then preset the corresponding parameters such as the substitution rate artificially, which require a lot of prior knowledge and are prone to fall into the sub-optimum. Besides, the number of editing operations is limited in the previous methods, which decreases the diversity of the augmented data and thus restricts the performance gain. To overcome the above limitations, we propose a framework named Text AutoAugment (TAA) to establish a compositional and learnable paradigm for data augmentation. We regard a combination of various operations as an augmentation policy and utilize an efficient Bayesian Optimization algorithm to automatically search for the best policy, which substantially improves the generalization capability of models. Experiments on six benchmark datasets show that TAA boosts classification accuracy in low-resource and class-imbalanced regimes by an average of 8.8% and 9.7%, respectively, outperforming strong baselines.

pdf bib
UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction
Huanqin Wu | Wei Liu | Lei Li | Dan Nie | Tao Chen | Feng Zhang | Di Wang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Learning Shared Semantic Space for Speech-to-Text Translation
Chi Han | Mingxuan Wang | Heng Ji | Lei Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Contrastive Aligned Joint Learning for Multilingual Summarization
Danqing Wang | Jiaze Chen | Hao Zhou | Xipeng Qiu | Lei Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Language Tags Matter for Zero-Shot Neural Machine Translation
Liwei Wu | Shanbo Cheng | Mingxuan Wang | Lei Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Probabilistic Graph Reasoning for Natural Proof Generation
Changzhi Sun | Xinbo Zhang | Jiangjie Chen | Chun Gan | Yuanbin Wu | Jiaze Chen | Hao Zhou | Lei Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade
Lei Li | Yankai Lin | Deli Chen | Shuhuai Ren | Peng Li | Jie Zhou | Xu Sun
Findings of the Association for Computational Linguistics: EMNLP 2021

Dynamic early exiting aims to accelerate the inference of pre-trained language models (PLMs) by emitting predictions in internal layers without passing through the entire model. In this paper, we empirically analyze the working mechanism of dynamic early exiting and find that it faces a performance bottleneck under high speed-up ratios. On one hand, the PLMs’ representations in shallow layers lack high-level semantic information and thus are not sufficient for accurate predictions. On the other hand, the exiting decisions made by internal classifiers are unreliable, leading to wrongly emitted early predictions. We instead propose a new framework for accelerating the inference of PLMs, CascadeBERT, which dynamically selects proper-sized and complete models in a cascading manner, providing comprehensive representations for predictions. We further devise a difficulty-aware objective, encouraging the model to output the class probability that reflects the real difficulty of each instance for a more reliable cascading mechanism. Experimental results show that CascadeBERT can achieve an overall 15% improvement under 4x speed-up compared with existing dynamic early exiting methods on six classification tasks, yielding more calibrated and accurate predictions.

pdf bib
Leveraging Word-Formation Knowledge for Chinese Word Sense Disambiguation
Hua Zheng | Lei Li | Damai Dai | Deli Chen | Tianyu Liu | Xu Sun | Yang Liu
Findings of the Association for Computational Linguistics: EMNLP 2021

In parataxis languages like Chinese, word meanings are constructed using specific word-formations, which can help to disambiguate word senses. However, such knowledge is rarely explored in previous word sense disambiguation (WSD) methods. In this paper, we propose to leverage word-formation knowledge to enhance Chinese WSD. We first construct a large-scale Chinese lexical sample WSD dataset with word-formations. Then, we propose a model FormBERT to explicitly incorporate word-formations into sense disambiguation. To further enhance generalizability, we design a word-formation predictor module in case word-formation annotations are unavailable. Experimental results show that our method brings substantial performance improvement over strong baselines.

pdf bib
Multilingual Translation via Grafting Pre-trained Language Models
Zewei Sun | Mingxuan Wang | Lei Li
Findings of the Association for Computational Linguistics: EMNLP 2021

Can pre-trained BERT for one language and GPT for another be glued together to translate texts? Self-supervised training using only monolingual data has led to the success of pre-trained (masked) language models in many NLP tasks. However, directly connecting BERT as an encoder and GPT as a decoder can be challenging in machine translation, for GPT-like models lack a cross-attention component that is needed in seq2seq decoders. In this paper, we propose Graformer to graft separately pre-trained (masked) language models for machine translation. With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data. Experiments on 60 directions show that our method achieves average improvements of 5.8 BLEU in x2en and 2.9 BLEU in en2x directions comparing with the multilingual Transformer of the same size.

pdf bib
Counter-Interference Adapter for Multilingual Machine Translation
Yaoming Zhu | Jiangtao Feng | Chengqi Zhao | Mingxuan Wang | Lei Li
Findings of the Association for Computational Linguistics: EMNLP 2021

Developing a unified multilingual model has been a long pursuing goal for machine translation. However, existing approaches suffer from performance degradation - a single multilingual model is inferior to separately trained bilingual ones on rich-resource languages. We conjecture that such a phenomenon is due to interference brought by joint training with multiple languages. To accommodate the issue, we propose CIAT, an adapted Transformer model with a small parameter overhead for multilingual machine translation. We evaluate CIAT on multiple benchmark datasets, including IWSLT, OPUS-100, and WMT. Experiments show that the CIAT consistently outperforms strong multilingual baselines on 64 of total 66 language directions, 42 of which have above 0.5 BLEU improvement.

pdf bib
Secoco: Self-Correcting Encoding for Neural Machine Translation
Tao Wang | Chengqi Zhao | Mingxuan Wang | Lei Li | Hang Li | Deyi Xiong
Findings of the Association for Computational Linguistics: EMNLP 2021

This paper presents Self-correcting Encoding (Secoco), a framework that effectively deals with noisy input for robust neural machine translation by introducing self-correcting predictors. Different from previous robust approaches, Secoco enables NMT to explicitly correct noisy inputs and delete specific errors simultaneously with the translation decoding process. Secoco is able to achieve significant improvements over strong baselines on two real-world test sets and a benchmark WMT dataset with good interpretability. We will make our code and dataset publicly available soon.

pdf bib
ENPAR:Enhancing Entity and Entity Pair Representations for Joint Entity Relation Extraction
Yijun Wang | Changzhi Sun | Yuanbin Wu | Hao Zhou | Lei Li | Junchi Yan
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Current state-of-the-art systems for joint entity relation extraction (Luan et al., 2019; Wad-den et al., 2019) usually adopt the multi-task learning framework. However, annotations for these additional tasks such as coreference resolution and event extraction are always equally hard (or even harder) to obtain. In this work, we propose a pre-training method ENPAR to improve the joint extraction performance. ENPAR requires only the additional entity annotations that are much easier to collect. Unlike most existing works that only consider incorporating entity information into the sentence encoder, we further utilize the entity pair information. Specifically, we devise four novel objectives,i.e., masked entity typing, masked entity prediction, adversarial context discrimination, and permutation prediction, to pre-train an entity encoder and an entity pair encoder. Comprehensive experiments show that the proposed pre-training method achieves significant improvement over BERT on ACE05, SciERC, and NYT, and outperforms current state-of-the-art on ACE05.

pdf bib
The Volctrans Neural Speech Translation System for IWSLT 2021
Chengqi Zhao | Zhicheng Liu | Jian Tong | Tao Wang | Mingxuan Wang | Rong Ye | Qianqian Dong | Jun Cao | Lei Li
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes the systems submitted to IWSLT 2021 by the Volctrans team. We participate in the offline speech translation and text-to-text simultaneous translation tracks. For offline speech translation, our best end-to-end model achieves 7.9 BLEU improvements over the benchmark on the MuST-C test set and is even approaching the results of a strong cascade solution. For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model. As a result, our final submitted systems exceed the benchmark at around 7 BLEU on the same latency regime. We release our code and model to facilitate both future research works and industrial applications.

pdf bib
Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models
Wenkai Yang | Lei Li | Zhiyuan Zhang | Xuancheng Ren | Xu Sun | Bin He
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack. Victim models can maintain competitive performance on clean samples while behaving abnormally on samples with a specific trigger word inserted. Previous backdoor attacking methods usually assume that attackers have a certain degree of data knowledge, either the dataset which users would use or proxy datasets for a similar task, for implementing the data poisoning procedure. However, in this paper, we find that it is possible to hack the model in a data-free way by modifying one single word embedding vector, with almost no accuracy sacrificed on clean samples. Experimental results on sentiment analysis and sentence-pair classification tasks show that our method is more efficient and stealthier. We hope this work can raise the awareness of such a critical security risk hidden in the embedding layers of NLP models. Our code is available at https://github.com/lancopku/Embedding-Poisoning.

pdf bib
Decompose, Fuse and Generate: A Formation-Informed Method for Chinese Definition Generation
Hua Zheng | Damai Dai | Lei Li | Tianyu Liu | Zhifang Sui | Baobao Chang | Yang Liu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this paper, we tackle the task of Definition Generation (DG) in Chinese, which aims at automatically generating a definition for a word. Most existing methods take the source word as an indecomposable semantic unit. However, in parataxis languages like Chinese, word meanings can be composed using the word formation process, where a word (“桃花”, peach-blossom) is formed by formation components (“桃”, peach; “花”, flower) using a formation rule (Modifier-Head). Inspired by this process, we propose to enhance DG with word formation features. We build a formation-informed dataset, and propose a model DeFT, which Decomposes words into formation features, dynamically Fuses different features through a gating mechanism, and generaTes word definitions. Experimental results show that our method is both effective and robust.

pdf bib
Generative Imagination Elevates Machine Translation
Quanyu Long | Mingxuan Wang | Lei Li
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

There are common semantics shared across text and images. Given a sentence in a source language, whether depicting the visual scene helps translation into a target language? Existing multimodal neural machine translation methods (MNMT) require triplets of bilingual sentence - image for training and tuples of source sentence - image for inference. In this paper, we propose ImagiT, a novel machine translation method via visual imagination. ImagiT first learns to generate visual representation from the source sentence, and then utilizes both source sentence and the “imagined representation” to produce a target translation. Unlike previous methods, it only needs the source sentence at the inference time. Experiments demonstrate that ImagiT benefits from visual imagination and significantly outperforms the text-only neural machine translation baselines. Further analysis reveals that the imagination process in ImagiT helps fill in missing information when performing the degradation strategy.

pdf bib
Cross-lingual Supervision Improves Unsupervised Neural Machine Translation
Mingxuan Wang | Hongxiao Bai | Lei Li | Hai Zhao
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

We propose to improve unsupervised neural machine translation with cross-lingual supervision (), which utilizes supervision signals from high resource language pairs to improve the translation of zero-source languages. Specifically, for training En-Ro system without parallel corpus, we can leverage the corpus from En-Fr and En-De to collectively train the translation from one language into many languages under one model. % is based on multilingual models which require no changes to the standard unsupervised NMT. Simple and effective, significantly improves the translation quality with a big margin in the benchmark unsupervised translation tasks, and even achieves comparable performance to supervised NMT. In particular, on WMT’14 -tasks achieves 37.6 and 35.18 BLEU score, which is very close to the large scale supervised setting and on WMT’16 -tasks achieves 35.09 BLEU score which is even better than the supervised Transformer baseline.

pdf bib
Autocorrect in the Process of Translation — Multi-task Learning Improves Dialogue Machine Translation
Tao Wang | Chengqi Zhao | Mingxuan Wang | Lei Li | Deyi Xiong
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Automatic translation of dialogue texts is a much needed demand in many real life scenarios. However, the currently existing neural machine translation delivers unsatisfying results. In this paper, we conduct a deep analysis of a dialogue corpus and summarize three major issues on dialogue translation, including pronoun dropping (), punctuation dropping (), and typos (). In response to these challenges, we propose a joint learning method to identify omission and typo, and utilize context to translate dialogue utterances. To properly evaluate the performance, we propose a manually annotated dataset with 1,931 Chinese-English parallel utterances from 300 dialogues as a benchmark testbed for dialogue translation. Our experiments show that the proposed method improves translation quality by 3.2 BLEU over the baselines. It also elevates the recovery rate of omitted pronouns from 26.09% to 47.16%. We will publish the code and dataset publicly at https://xxx.xx.

pdf bib
LightSeq: A High Performance Inference Library for Transformers
Xiaohui Wang | Ying Xiong | Yang Wei | Mingxuan Wang | Lei Li
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Transformer and its variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose , a highly efficient inference library for models in the Transformer family. includes a series of GPU optimization techniques to both streamline the computation of Transformer layers and reduce memory footprint. supports models trained using PyTorch and Tensorflow. Experimental results on standard machine translation benchmarks show that achieves up to 14x speedup compared with TensorFlow and 1.4x speedup compared with , a concurrent CUDA implementation. The code will be released publicly after the review.

pdf bib
UniRE: A Unified Label Space for Entity Relation Extraction
Yijun Wang | Changzhi Sun | Yuanbin Wu | Hao Zhou | Lei Li | Junchi Yan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Many joint entity relation extraction models setup two separated label spaces for the two sub-tasks (i.e., entity detection and relation classification). We argue that this setting may hinder the information interaction between entities and relations. In this work, we propose to eliminate the different treatment on the two sub-tasks’ label spaces. The input of our model is a table containing all word pairs from a sentence. Entities and relations are represented by squares and rectangles in the table. We apply a unified classifier to predict each cell’s label, which unifies the learning of two sub-tasks. For testing, an effective (yet fast) approximate decoder is proposed for finding squares and rectangles from tables. Experiments on three benchmarks (ACE04, ACE05, SciERC) show that, using only half the number of parameters, our model achieves competitive accuracy with the best extractor, and is faster.

pdf bib
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Xiao Pan | Mingxuan Wang | Liwei Wu | Lei Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Existing multilingual machine translation approaches mainly focus on English-centric directions, while the non-English directions still lag behind. In this work, we aim to build a many-to-many translation system with an emphasis on the quality of non-English language directions. Our intuition is based on the hypothesis that a universal cross-language representation leads to better multilingual translation performance. To this end, we propose mRASP2, a training method to obtain a single unified multilingual translation model. mRASP2 is empowered by two techniques: a) a contrastive learning scheme to close the gap among representations of different languages, and b) data augmentation on both multiple parallel and monolingual data to further align token representations. For English-centric directions, mRASP2 achieves competitive or even better performance than a strong pre-trained model mBART on tens of WMT benchmarks. For non-English directions, mRASP2 achieves an improvement of average 10+ BLEU compared with the multilingual baseline

pdf bib
Learning Language Specific Sub-network for Multilingual Machine Translation
Zehui Lin | Liwei Wu | Mingxuan Wang | Lei Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Multilingual neural machine translation aims at learning a single translation model for multiple languages. These jointly trained models often suffer from performance degradationon rich-resource language pairs. We attribute this degeneration to parameter interference. In this paper, we propose LaSS to jointly train a single unified multilingual MT model. LaSS learns Language Specific Sub-network (LaSS) for each language pair to counter parameter interference. Comprehensive experiments on IWSLT and WMT datasets with various Transformer architectures show that LaSS obtains gains on 36 language pairs by up to 1.2 BLEU. Besides, LaSS shows its strong generalization performance at easy adaptation to new language pairs and zero-shot translation. LaSS boosts zero-shot translation with an average of 8.3 BLEU on 30 language pairs. Codes and trained models are available at https://github.com/NLP-Playground/LaSS.

pdf bib
Glancing Transformer for Non-Autoregressive Neural Machine Translation
Lihua Qian | Hao Zhou | Yu Bao | Mingxuan Wang | Lin Qiu | Weinan Zhang | Yong Yu | Lei Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recent work on non-autoregressive neural machine translation (NAT) aims at improving the efficiency by parallel decoding without sacrificing the quality. However, existing NAT methods are either inferior to Transformer or require multiple decoding passes, leading to reduced speedup. We propose the Glancing Language Model (GLM) for single-pass parallel generation models. With GLM, we develop Glancing Transformer (GLAT) for machine translation. With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8×-15× speedup. Note that GLAT does not modify the network architecture, which is a training method to learn word interdependency. Experiments on multiple WMT language directions show that GLAT outperforms all previous single pass non-autoregressive methods, and is nearly comparable to Transformer, reducing the gap to 0.25-0.9 BLEU points.

pdf bib
Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker
Runxin Xu | Tianyu Liu | Lei Li | Baobao Chang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Document-level event extraction aims to recognize event information from a whole piece of article. Existing methods are not effective due to two challenges of this task: a) the target event arguments are scattered across sentences; b) the correlation among events in a document is non-trivial to model. In this paper, we propose Heterogeneous Graph-based Interaction Model with a Tracker (GIT) to solve the aforementioned two challenges. For the first challenge, GIT constructs a heterogeneous graph interaction network to capture global interactions among different sentences and entity mentions. For the second, GIT introduces a Tracker module to track the extracted events and hence capture the interdependency among the events. Experiments on a large-scale dataset (Zheng et al, 2019) show GIT outperforms the previous methods by 2.8 F1. Further analysis reveals is effective in extracting multiple correlated events and event arguments that scatter across the document.

pdf bib
Personalized Transformer for Explainable Recommendation
Lei Li | Yongfeng Zhang | Li Chen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Personalization of natural language generation plays a vital role in a large spectrum of tasks, such as explainable recommendation, review summarization and dialog systems. In these tasks, user and item IDs are important identifiers for personalization. Transformer, which is demonstrated with strong language modeling capability, however, is not personalized and fails to make use of the user and item IDs since the ID tokens are not even in the same semantic space as the words. To address this problem, we present a PErsonalized Transformer for Explainable Recommendation (PETER), on which we design a simple and effective learning objective that utilizes the IDs to predict the words in the target explanation, so as to endow the IDs with linguistic meanings and to achieve personalized Transformer. Besides generating explanations, PETER can also make recommendations, which makes it a unified model for the whole recommendation-explanation pipeline. Extensive experiments show that our small unpretrained model outperforms fine-tuned BERT on the generation task, in terms of both effectiveness and efficiency, which highlights the importance and the nice utility of our design.

pdf bib
Vocabulary Learning via Optimal Transport for Neural Machine Translation
Jingjing Xu | Hao Zhou | Chun Gan | Zaixiang Zheng | Lei Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The choice of token vocabulary affects the performance of machine translation. This paper aims to figure out what is a good vocabulary and whether we can find the optimal vocabulary without trial training. To answer these questions, we first provide an alternative understanding of vocabulary from the perspective of information theory. It motivates us to formulate the quest of vocabularization – finding the best token dictionary with a proper size – as an optimal transport (OT) problem. We propose VOLT, a simple and efficient solution without trial training. Empirical results show that VOLT beats widely-used vocabularies in diverse scenarios, including WMT-14 English-German translation, TED bilingual translation, and TED multilingual translation. For example, VOLT achieves 70% vocabulary size reduction and 0.5 BLEU gain on English-German translation. Also, compared to BPE-search, VOLT reduces the search time from 384 GPU hours to 30 GPU hours on English-German translation. Codes are available at https://github.com/Jingjing-NLP/VOLT.

pdf bib
NeurST: Neural Speech Translation Toolkit
Chengqi Zhao | Mingxuan Wang | Qianqian Dong | Rong Ye | Lei Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

NeurST is an open-source toolkit for neural speech translation. The toolkit mainly focuses on end-to-end speech translation, which is easy to use, modify, and extend to advanced speech translation research and products. NeurST aims at facilitating the speech translation research for NLP researchers and building reliable benchmarks for this field. It provides step-by-step recipes for feature extraction, data preprocessing, distributed training, and evaluation. In this paper, we will introduce the framework design of NeurST and show experimental results for different benchmark datasets, which can be regarded as reliable baselines for future research. The toolkit is publicly available at https://github.com/bytedance/neurst and we will continuously update the performance of with other counterparts and studies at https://st-benchmark.github.io/.

pdf bib
Pre-training Methods for Neural Machine Translation
Mingxuan Wang | Lei Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Tutorial Abstracts

This tutorial provides a comprehensive guide to make the most of pre-training for neural machine translation. Firstly, we will briefly introduce the background of NMT, pre-training methodology, and point out the main challenges when applying pre-training for NMT. Then we will focus on analysing the role of pre-training in enhancing the performance of NMT, how to design a better pre-training model for executing specific NMT tasks and how to better integrate the pre-trained model into NMT system. In each part, we will provide examples, discuss training techniques and analyse what is transferred when applying pre-training.

2020

pdf bib
CIST@CL-SciSumm 2020, LongSumm 2020: Automatic Scientific Document Summarization
Lei Li | Yang Xie | Wei Liu | Yinan Liu | Yafei Jiang | Siya Qi | Xingyuan Li
Proceedings of the First Workshop on Scholarly Document Processing

Our system participates in two shared tasks, CL-SciSumm 2020 and LongSumm 2020. In the CL-SciSumm shared task, based on our previous work, we apply more machine learning methods on position features and content features for facet classification in Task1B. And GCN is introduced in Task2 to perform extractive summarization. In the LongSumm shared task, we integrate both the extractive and abstractive summarization ways. Three methods were tested which are T5 Fine-tuning, DPPs Sampling, and GRU-GCN/GAT.

pdf bib
Extractive Financial Narrative Summarisation based on DPPs
Lei Li | Yafei Jiang | Yinan Liu
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

We participate in the FNS-Summarisation 2020 shared task to be held at FNP 2020 workshop at COLING 2020. Based on Determinantal Point Processes (DPPs), we build an extractive automatic financial summarisation system for the specific task. In this system, we first analyze the long report data to select the important narrative parts and generate an intermediate document. Next, we build the kernel Matrix L for the intermediate document, which represents the quality of its sentences. On the basis of L, we then can use the DPPs sampling algorithm to choose those sentences with high quality and diversity as the final summary sentences.

pdf bib
Do you have the right scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods
Ning Miao | Yuxuan Song | Hao Zhou | Lei Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

It has been a common approach to pre-train a language model on a large corpus and fine-tune it on task-specific data. In practice, we observe that fine-tuning a pre-trained model on a small dataset may lead to over- and/or under-estimate problem. In this paper, we propose MC-Tailor, a novel method to alleviate the above issue in text generation tasks by truncating and transferring the probability mass from over-estimated regions to under-estimated ones. Experiments on a variety of text generation datasets show that MC-Tailor consistently and significantly outperforms the fine-tuning approach.

pdf bib
Xiaomingbot: A Multilingual Robot News Reporter
Runxin Xu | Jun Cao | Mingxuan Wang | Jiaze Chen | Hao Zhou | Ying Zeng | Yuping Wang | Li Chen | Xiang Yin | Xijin Zhang | Songcheng Jiang | Yuxuan Wang | Lei Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four inte- gral capabilities: news generation, news translation, news reading and avatar animation. Its system summarizes Chinese news that it automatically generates from data tables. Next, it translates the summary or the full article into multiple languages, and reads the multi- lingual rendition through synthesized speech. Notably, Xiaomingbot utilizes a voice cloning technology to synthesize the speech trained from a real person’s voice data in one input language. The proposed system enjoys several merits: it has an animated avatar, and is able to generate and read multilingual news. Since it was put into practice, Xiaomingbot has written over 600,000 articles, and gained over 150,000 followers on social media platforms.

pdf bib
The Volctrans Machine Translation System for WMT20
Liwei Wu | Xiao Pan | Zehui Lin | Yaoming Zhu | Mingxuan Wang | Lei Li
Proceedings of the Fifth Conference on Machine Translation

This paper describes our submission systems for VolcTrans for WMT20 shared news translation task. We participated in 8 translation directions. Our basic systems are based on Transformer (CITATION), into which we also employed new architectures (bigger or deeper Transformers, dynamic convolution). The final systems include text pre-process, subword(a.k.a. BPE(CITATION)), baseline model training, iterative back-translation, model ensemble, knowledge distillation and multilingual pre-training.

pdf bib
Volctrans Parallel Corpus Filtering System for WMT 2020
Runxin Xu | Zhuo Zhi | Jun Cao | Mingxuan Wang | Lei Li
Proceedings of the Fifth Conference on Machine Translation

In this paper, we describe our submissions to the WMT20 shared task on parallel corpus filtering and alignment for low-resource conditions. The task requires the participants to align potential parallel sentence pairs out of the given document pairs, and score them so that low-quality pairs can be filtered. Our system, Volctrans, is made of two modules, i.e., a mining module and a scoring module. Based on the word alignment model, the mining mod- ule adopts an iterative mining strategy to extract latent parallel sentences. In the scoring module, an XLM-based scorer provides scores, followed by reranking mechanisms and ensemble. Our submissions outperform the baseline by 3.x/2.x and 2.x/2.x for km-en and ps-en on From Scratch/Fine-Tune conditions.

pdf bib
Double Graph Based Reasoning for Document-level Relation Extraction
Shuang Zeng | Runxin Xu | Baobao Chang | Lei Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Document-level relation extraction aims to extract relations among entities within a document. Different from sentence-level relation extraction, it requires reasoning over multiple sentences across paragraphs. In this paper, we propose Graph Aggregation-and-Inference Network (GAIN), a method to recognize such relations for long paragraphs. GAIN constructs two graphs, a heterogeneous mention-level graph (MG) and an entity-level graph (EG). The former captures complex interaction among different mentions and the latter aggregates mentions underlying for the same entities. Based on the graphs we propose a novel path reasoning mechanism to infer relations between entities. Experiments on the public dataset, DocRED, show GAIN achieves a significant performance improvement (2.85 on F1) over the previous state-of-the-art. Our code is available at https://github.com/PKUnlp-icler/GAIN.

pdf bib
Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information
Zehui Lin | Xiao Pan | Mingxuan Wang | Xipeng Qiu | Jiangtao Feng | Hao Zhou | Lei Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We investigate the following question for machine translation (MT): can we develop a single universal MT model to serve as the common seed and obtain derivative and improved models on arbitrary language pairs? We propose mRASP, an approach to pre-train a universal multilingual neural machine translation model. Our key idea in mRASP is its novel technique of random aligned substitution, which brings words and phrases with similar meanings across multiple languages closer in the representation space. We pre-train a mRASP model on 32 language pairs jointly with only public datasets. The model is then fine-tuned on downstream language pairs to obtain specialized MT models. We carry out extensive experiments on 42 translation directions across a diverse settings, including low, medium, rich resource, and as well as transferring to exotic language pairs. Experimental results demonstrate that mRASP achieves significant performance improvement compared to directly training on those target pairs. It is the first time to verify that multiple lowresource language pairs can be utilized to improve rich resource MT. Surprisingly, mRASP is even able to improve the translation quality on exotic languages that never occur in the pretraining corpus. Code, data, and pre-trained models are available at https://github. com/linzehui/mRASP.

pdf bib
On the Sentence Embeddings from Pre-trained Language Models
Bohan Li | Hao Zhou | Junxian He | Mingxuan Wang | Yiming Yang | Lei Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow.

pdf bib
Language Generation via Combinatorial Constraint Satisfaction: A Tree Search Enhanced Monte-Carlo Approach
Maosen Zhang | Nan Jiang | Lei Li | Yexiang Xue
Findings of the Association for Computational Linguistics: EMNLP 2020

Generating natural language under complex constraints is a principled formulation towards controllable text generation. We present a framework to allow specification of combinatorial constraints for sentence generation. We propose TSMC, an efficient method to generate high likelihood sentences with respect to a pre-trained language model while satisfying the constraints. Our approach is highly flexible, requires no task-specific train- ing, and leverages efficient constraint satisfaction solving techniques. To better handle the combinatorial constraints, a tree search algorithm is embedded into the proposal process of the Markov Chain Monte Carlo (MCMC) to explore candidates that satisfy more constraints. Compared to existing MCMC approaches, our sampling approach has a better mixing performance. Experiments show that TSMC achieves consistent and significant improvement on multiple language generation tasks.

pdf bib
Active Sentence Learning by Adversarial Uncertainty Sampling in Discrete Space
Dongyu Ru | Jiangtao Feng | Lin Qiu | Hao Zhou | Mingxuan Wang | Weinan Zhang | Yong Yu | Lei Li
Findings of the Association for Computational Linguistics: EMNLP 2020

Active learning for sentence understanding aims at discovering informative unlabeled data for annotation and therefore reducing the demand for labeled data. We argue that the typical uncertainty sampling method for active learning is time-consuming and can hardly work in real-time, which may lead to ineffective sample selection. We propose adversarial uncertainty sampling in discrete space (AUSDS) to retrieve informative unlabeled samples more efficiently. AUSDS maps sentences into latent space generated by the popular pre-trained language models, and discover informative unlabeled text samples for annotation via adversarial attack. The proposed approach is extremely efficient compared with traditional uncertainty sampling with more than 10x speedup. Experimental results on five datasets show that AUSDS outperforms strong baselines on effectiveness.

2019

pdf bib
In Conclusion Not Repetition: Comprehensive Abstractive Summarization with Diversified Attention Based on Determinantal Point Processes
Lei Li | Wei Liu | Marina Litvak | Natalia Vanetik | Zuying Huang
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Various Seq2Seq learning models designed for machine translation were applied for abstractive summarization task recently. Despite these models provide high ROUGE scores, they are limited to generate comprehensive summaries with a high level of abstraction due to its degenerated attention distribution. We introduce Diverse Convolutional Seq2Seq Model(DivCNN Seq2Seq) using Determinantal Point Processes methods(Micro DPPs and Macro DPPs) to produce attention distribution considering both quality and diversity. Without breaking the end to end architecture, DivCNN Seq2Seq achieves a higher level of comprehensiveness compared to vanilla models and strong baselines. All the reproducible codes and datasets are available online.

pdf bib
Towards Linear Time Neural Machine Translation with Capsule Networks
Mingxuan Wang | Jun Xie | Zhixing Tan | Jinsong Su | Deyi Xiong | Lei Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this study, we first investigate a novel capsule network with dynamic routing for linear time Neural Machine Translation (NMT), referred as CapsNMT. CapsNMT uses an aggregation mechanism to map the source sentence into a matrix with pre-determined size, and then applys a deep LSTM network to decode the target sequence from the source representation. Unlike the previous work (CITATION) to store the source sentence with a passive and bottom-up way, the dynamic routing policy encodes the source sentence with an iterative process to decide the credit attribution between nodes from lower and higher layers. CapsNMT has two core properties: it runs in time that is linear in the length of the sequences and provides a more flexible way to aggregate the part-whole information of the source sentence. On WMT14 English-German task and a larger WMT14 English-French task, CapsNMT achieves comparable results with the Transformer system. To the best of our knowledge, this is the first work that capsule networks have been empirically investigated for sequence to sequence problems.

pdf bib
Pun-GAN: Generative Adversarial Network for Pun Generation
Fuli Luo | Shunyao Li | Pengcheng Yang | Lei Li | Baobao Chang | Zhifang Sui | Xu Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this paper, we focus on the task of generating a pun sentence given a pair of word senses. A major challenge for pun generation is the lack of large-scale pun corpus to guide supervised learning. To remedy this, we propose an adversarial generative network for pun generation (Pun-GAN). It consists of a generator to produce pun sentences, and a discriminator to distinguish between the generated pun sentences and the real sentences with specific word senses. The output of the discriminator is then used as a reward to train the generator via reinforcement learning, encouraging it to produce pun sentences which can support two word senses simultaneously. Experiments show that the proposed Pun-GAN can generate sentences that are more ambiguous and diverse in both automatic and human evaluation.

bib
Discreteness in Neural Natural Language Processing
Lili Mou | Hao Zhou | Lei Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): Tutorial Abstracts

This tutorial provides a comprehensive guide to the process of discreteness in neural NLP.As a gentle start, we will briefly introduce the background of deep learning based NLP, where we point out the ubiquitous discreteness of natural language and its challenges in neural information processing. Particularly, we will focus on how such discreteness plays a role in the input space, the latent space, and the output space of a neural network. In each part, we will provide examples, discuss machine learning techniques, as well as demonstrate NLP applications.

pdf bib
Enhancing Topic-to-Essay Generation with External Commonsense Knowledge
Pengcheng Yang | Lei Li | Fuli Luo | Tianyu Liu | Xu Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatic topic-to-essay generation is a challenging task since it requires generating novel, diverse, and topic-consistent paragraph-level text with a set of topics as input. Previous work tends to perform essay generation based solely on the given topics while ignoring massive commonsense knowledge. However, this commonsense knowledge provides additional background information, which can help to generate essays that are more novel and diverse. Towards filling this gap, we propose to integrate commonsense from the external knowledge base into the generator through dynamic memory mechanism. Besides, the adversarial training based on a multi-label discriminator is employed to further improve topic-consistency. We also develop a series of automatic evaluation metrics to comprehensively assess the quality of the generated essay. Experiments show that with external commonsense knowledge and adversarial training, the generated essays are more novel, diverse, and topic-consistent than existing methods in terms of both automatic and human evaluation.

pdf bib
Cross-Modal Commentator: Automatic Machine Commenting Based on Cross-Modal Information
Pengcheng Yang | Zhihan Zhang | Fuli Luo | Lei Li | Chengyang Huang | Xu Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatic commenting of online articles can provide additional opinions and facts to the reader, which improves user experience and engagement on social media platforms. Previous work focuses on automatic commenting based solely on textual content. However, in real-scenarios, online articles usually contain multiple modal contents. For instance, graphic news contains plenty of images in addition to text. Contents other than text are also vital because they are not only more attractive to the reader but also may provide critical information. To remedy this, we propose a new task: cross-model automatic commenting (CMAC), which aims to make comments by integrating multiple modal contents. We construct a large-scale dataset for this task and explore several representative methods. Going a step further, an effective co-attention model is presented to capture the dependency between textual and visual information. Evaluation results show that our proposed model can achieve better performance than competitive baselines.

pdf bib
Generating Fluent Adversarial Examples for Natural Languages
Huangzhao Zhang | Hao Zhou | Ning Miao | Lei Li
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Efficiently building an adversarial attacker for natural language processing (NLP) tasks is a real challenge. Firstly, as the sentence space is discrete, it is difficult to make small perturbations along the direction of gradients. Secondly, the fluency of the generated examples cannot be guaranteed. In this paper, we propose MHA, which addresses both problems by performing Metropolis-Hastings sampling, whose proposal is designed with the guidance of gradients. Experiments on IMDB and SNLI show that our proposed MHAoutperforms the baseline model on attacking capability. Adversarial training with MHA also leads to better robustness and performance.

pdf bib
Generating Sentences from Disentangled Syntactic and Semantic Spaces
Yu Bao | Hao Zhou | Shujian Huang | Lei Li | Lili Mou | Olga Vechtomova | Xin-yu Dai | Jiajun Chen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Variational auto-encoders (VAEs) are widely used in natural language generation due to the regularization of the latent space. However, generating sentences from the continuous latent space does not explicitly model the syntactic information. In this paper, we propose to generate sentences from disentangled syntactic and semantic spaces. Our proposed method explicitly models syntactic information in the VAE’s latent space by using the linearized tree sequence, leading to better performance of language generation. Additionally, the advantage of sampling in the disentangled syntactic and semantic latent spaces enables us to perform novel applications, such as the unsupervised paraphrase generation and syntax transfer generation. Experimental results show that our proposed model achieves similar or better performance in various tasks, compared with state-of-the-art related work.

pdf bib
Dynamically Fused Graph Network for Multi-hop Reasoning
Lin Qiu | Yunxuan Xiao | Yanru Qu | Hao Zhou | Lei Li | Weinan Zhang | Yong Yu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Text-based question answering (TBQA) has been studied extensively in recent years. Most existing approaches focus on finding the answer to a question within a single paragraph. However, many difficult questions require multiple supporting evidence from scattered text among two or more documents. In this paper, we propose Dynamically Fused Graph Network (DFGN), a novel method to answer those questions requiring multiple scattered evidence and reasoning over them. Inspired by human’s step-by-step reasoning behavior, DFGN includes a dynamic fusion layer that starts from the entities mentioned in the given query, explores along the entity graph dynamically built from the text, and gradually finds relevant supporting entities from the given documents. We evaluate DFGN on HotpotQA, a public TBQA dataset requiring multi-hop reasoning. DFGN achieves competitive results on the public board. Furthermore, our analysis shows DFGN produces interpretable reasoning chains.

pdf bib
Automatic Generation of Personalized Comment Based on User Profile
Wenhuan Zeng | Abulikemu Abuduweili | Lei Li | Pengcheng Yang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Comments on social media are very diverse, in terms of content, style and vocabulary, which make generating comments much more challenging than other existing natural language generation (NLG) tasks. Besides, since different user has different expression habits, it is necessary to take the user’s profile into consideration when generating comments. In this paper, we introduce the task of automatic generation of personalized comment (AGPC) for social media. Based on tens of thousands of users’ real comments and corresponding user profiles on weibo, we propose Personalized Comment Generation Network (PCGN) for AGPC. The model utilizes user feature embedding with a gated memory and attends to user description to model personality of users. In addition, external user representation is taken into consideration during the decoding to enhance the comments generation. Experimental results show that our model can generate natural, human-like and personalized comments.

pdf bib
Rethinking Text Attribute Transfer: A Lexical Analysis
Yao Fu | Hao Zhou | Jiaze Chen | Lei Li
Proceedings of the 12th International Conference on Natural Language Generation

Text attribute transfer is modifying certain linguistic attributes (e.g. sentiment, style, author-ship, etc.) of a sentence and transforming them from one type to another. In this paper, we aim to analyze and interpret what is changed during the transfer process. We start from the observation that in many existing models and datasets, certain words within a sentence play important roles in determining the sentence attribute class. These words are referred as the Pivot Words. Based on these pivot words, we propose a lexical analysis framework, the Pivot Analysis, to quantitatively analyze the effects of these words in text attribute classification and transfer. We apply this framework to existing datasets and models and show that: (1) the pivot words are strong features for the classification of sentence attributes; (2) to change the attribute of a sentence, many datasets only requires to change certain pivot words; (3) consequently, many transfer models only perform the lexical-level modification,while leaving higher-level sentence structures unchanged. Our work provides an in-depth understanding of linguistic attribute transfer and further identifies the future requirements and challenges of this task

pdf bib
Multi-lingual Wikipedia Summarization and Title Generation On Low Resource Corpus
Wei Liu | Lei Li | Zuying Huang | Yinan Liu
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources

MultiLing 2019 Headline Generation Task on Wikipedia Corpus raised a critical and practical problem: multilingual task on low resource corpus. In this paper we proposed QDAS extractive summarization model enhanced by sentence2vec and try to apply transfer learning based on large multilingual pre-trained language model for Wikipedia Headline Generation task. We treat it as sequence labeling task and develop two schemes to handle with it. Experimental results have shown that large pre-trained model can effectively utilize learned knowledge to extract certain phrase using low resource supervised data.

2018

pdf bib
On Tree-Based Neural Sentence Modeling
Haoyue Shi | Hao Zhou | Jiaze Chen | Lei Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Neural networks with tree-based sentence encoders have shown better results on many downstream tasks. Most of existing tree-based encoders adopt syntactic parsing trees as the explicit structure prior. To study the effectiveness of different tree structures, we replace the parsing trees with trivial trees (i.e., binary balanced tree, left-branching tree and right-branching tree) in the encoders. Though trivial trees contain no syntactic information, those encoders get competitive or even better results on all of the ten downstream tasks we investigated. This surprising result indicates that explicit syntax guidance may not be the main contributor to the superior performances of tree-based neural sentence modeling. Further analysis show that tree modeling gives better results when crucial words are closer to the final representation. Additional experiments give more clues on how to design an effective tree-based encoder. Our code is open-source and available at https://github.com/ExplorerFreda/TreeEnc.

pdf bib
Reinforced Co-Training
Jiawei Wu | Lei Li | William Yang Wang
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Co-training is a popular semi-supervised learning framework to utilize a large amount of unlabeled data in addition to a small labeled set. Co-training methods exploit predicted labels on the unlabeled data and select samples based on prediction confidence to augment the training. However, the selection of samples in existing co-training methods is based on a predetermined policy, which ignores the sampling bias between the unlabeled and the labeled subsets, and fails to explore the data space. In this paper, we propose a novel method, Reinforced Co-Training, to select high-quality unlabeled samples to better co-train on. More specifically, our approach uses Q-learning to learn a data selection policy with a small labeled dataset, and then exploits this policy to train the co-training classifiers automatically. Experimental results on clickbait detection and generic text classification tasks demonstrate that our proposed method can obtain more accurate text classification results.

2017

pdf bib
Word Embedding and Topic Modeling Enhanced Multiple Features for Content Linking and Argument / Sentiment Labeling in Online Forums
Lei Li | Liyuan Mao | Moye Chen
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

Multiple grammatical and semantic features are adopted in content linking and argument/sentiment labeling for online forums in this paper. There are mainly two different methods for content linking. First, we utilize the deep feature obtained from Word Embedding Model in deep learning and compute sentence similarity. Second, we use multiple traditional features to locate candidate linking sentences, and then adopt a voting method to obtain the final result. LDA topic modeling is used to mine latent semantic feature and K-means clustering is implemented for argument labeling, while features from sentiment dictionaries and rule-based sentiment analysis are integrated for sentiment labeling. Experimental results have shown that our methods are valid.

pdf bib
Enhancing Automatic ICD-9-CM Code Assignment for Medical Texts with PubMed
Danchen Zhang | Daqing He | Sanqiang Zhao | Lei Li
BioNLP 2017

Assigning a standard ICD-9-CM code to disease symptoms in medical texts is an important task in the medical domain. Automating this process could greatly reduce the costs. However, the effectiveness of an automatic ICD-9-CM code classifier faces a serious problem, which can be triggered by unbalanced training data. Frequent diseases often have more training data, which helps its classification to perform better than that of an infrequent disease. However, a disease’s frequency does not necessarily reflect its importance. To resolve this training data shortage problem, we propose to strategically draw data from PubMed to enrich the training data when there is such need. We validate our method on the CMC dataset, and the evaluation results indicate that our method can significantly improve the code assignment classifiers’ performance at the macro-averaging level.

2016

pdf bib
CIST System for CL-SciSumm 2016 Shared Task
Lei Li | Liyuan Mao | Yazhao Zhang | Junqi Chi | Taiwen Huang | Xiaoyue Cong | Heng Peng
Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)

pdf bib
CFO: Conditional Focused Neural Question Answering with Large-scale Knowledge Bases
Zihang Dai | Lei Li | Wei Xu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2014

pdf bib
Personal Attributes Extraction Based on the Combination of Trigger Words, Dictionary and Rules
Kailun Zhang | Mingyin Wang | Xiaoyue Cong | Fang Huang | Hongfa Xue | Lei Li | Zhiqiao Gao
Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

2013

pdf bib
Multi-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian
Lei Li | Corina Forascu | Mahmoud El-Haj | George Giannakopoulos
Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization

pdf bib
CIST System Report for ACL MultiLing 2013 – Track 1: Multilingual Multi-document Summarization
Lei Li | Wei Heng | Jia Yu | Yu Liu | Shuhong Wan
Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization

2006

pdf bib
Research on Olympics-oriented Mobile Game News Ordering System
Yonggui Yang | Lei Li
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

Search
Co-authors