Yiming Cui


pdf bib
TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
Ziqing Yang | Yiming Cui | Zhigang Chen
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Pre-trained language models have been prevailed in natural language processing and become the backbones of many NLP tasks, but the demands for computational resources have limited their applications. In this paper, we introduce TextPruner, an open-source model pruning toolkit designed for pre-trained language models, targeting fast and easy model compression. TextPruner offers structured post-training pruning methods, including vocabulary pruning and transformer pruning, and can be applied to various models and tasks. We also propose a self-supervised pruning method that can be applied without the labeled data. Our experiments with several NLP tasks demonstrate the ability of TextPruner to reduce the model size without re-training the model.


pdf bib
Adversarial Training for Machine Reading Comprehension with Virtual Embeddings
Ziqing Yang | Yiming Cui | Chenglei Si | Wanxiang Che | Ting Liu | Shijin Wang | Guoping Hu
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

Adversarial training (AT) as a regularization method has proved its effectiveness on various tasks. Though there are successful applications of AT on some NLP tasks, the distinguishing characteristics of NLP tasks have not been exploited. In this paper, we aim to apply AT on machine reading comprehension (MRC) tasks. Furthermore, we adapt AT for MRC tasks by proposing a novel adversarial training method called PQAT that perturbs the embedding matrix instead of word vectors. To differentiate the roles of passages and questions, PQAT uses additional virtual P/Q-embedding matrices to gather the global perturbations of words from passages and questions separately. We test the method on a wide range of MRC tasks, including span-based extractive RC and multiple-choice RC. The results show that adversarial training is effective universally, and PQAT further improves the performance.

pdf bib
Bilingual Alignment Pre-Training for Zero-Shot Cross-Lingual Transfer
Ziqing Yang | Wentao Ma | Yiming Cui | Jiani Ye | Wanxiang Che | Shijin Wang
Proceedings of the 3rd Workshop on Machine Reading for Question Answering

Multilingual pre-trained models have achieved remarkable performance on cross-lingual transfer learning. Some multilingual models such as mBERT, have been pre-trained on unlabeled corpora, therefore the embeddings of different languages in the models may not be aligned very well. In this paper, we aim to improve the zero-shot cross-lingual transfer performance by proposing a pre-training task named Word-Exchange Aligning Model (WEAM), which uses the statistical alignment information as the prior knowledge to guide cross-lingual word prediction. We evaluate our model on multilingual machine reading comprehension task MLQA and natural language interface task XNLI. The results show that WEAM can significantly improve the zero-shot performance.

pdf bib
Benchmarking Robustness of Machine Reading Comprehension Models
Chenglei Si | Ziqing Yang | Yiming Cui | Wentao Ma | Ting Liu | Shijin Wang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021


pdf bib
CharBERT: Character-aware Pre-trained Language Model
Wentao Ma | Yiming Cui | Chenglei Si | Ting Liu | Shijin Wang | Guoping Hu
Proceedings of the 28th International Conference on Computational Linguistics

Most pre-trained language models (PLMs) construct word representations at subword level with Byte-Pair Encoding (BPE) or its variations, by which OOV (out-of-vocab) words are almost avoidable. However, those methods split a word into subword units and make the representation incomplete and fragile.In this paper, we propose a character-aware pre-trained language model named CharBERT improving on the previous methods (such as BERT, RoBERTa) to tackle these problems. We first construct the contextual word embedding for each token from the sequential character representations, then fuse the representations of characters and the subword representations by a novel heterogeneous interaction module. We also propose a new pre-training task named NLM (Noisy LM) for unsupervised character representation learning. We evaluate our method on question answering, sequence labeling, and text classification tasks, both on the original datasets and adversarial misspelling test sets. The experimental results show that our method can significantly improve the performance and robustness of PLMs simultaneously.

pdf bib
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu | Hai Hu | Xuanwei Zhang | Lu Li | Chenjie Cao | Yudong Li | Yechen Xu | Kai Sun | Dian Yu | Cong Yu | Yin Tian | Qianqian Dong | Weitang Liu | Bo Shi | Yiming Cui | Junyi Li | Jun Zeng | Rongzhao Wang | Weijian Xie | Yanting Li | Yina Patterson | Zuoyu Tian | Yiwen Zhang | He Zhou | Shaoweihua Liu | Zhe Zhao | Qipeng Zhao | Cong Yue | Xinrui Zhang | Zhengliang Yang | Kyle Richardson | Zhenzhong Lan
Proceedings of the 28th International Conference on Computational Linguistics

The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks. These comprehensive benchmarks have facilitated a broad range of research and applications in natural language processing (NLP). The problem, however, is that most such benchmarks are limited to English, which has made it difficult to replicate many of the successes in English NLU for other languages. To help remedy this issue, we introduce the first large-scale Chinese Language Understanding Evaluation (CLUE) benchmark. CLUE is an open-ended, community-driven project that brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text. To establish results on these tasks, we report scores using an exhaustive set of current state-of-the-art pre-trained Chinese models (9 in total). We also introduce a number of supplementary datasets and additional tools to help facilitate further progress on Chinese NLU. Our benchmark is released at https://www.cluebenchmarks.com

pdf bib
A Sentence Cloze Dataset for Chinese Machine Reading Comprehension
Yiming Cui | Ting Liu | Ziqing Yang | Zhipeng Chen | Wentao Ma | Wanxiang Che | Shijin Wang | Guoping Hu
Proceedings of the 28th International Conference on Computational Linguistics

Owing to the continuous efforts by the Chinese NLP community, more and more Chinese machine reading comprehension datasets become available. To add diversity in this area, in this paper, we propose a new task called Sentence Cloze-style Machine Reading Comprehension (SC-MRC). The proposed task aims to fill the right candidate sentence into the passage that has several blanks. We built a Chinese dataset called CMRC 2019 to evaluate the difficulty of the SC-MRC task. Moreover, to add more difficulties, we also made fake candidates that are similar to the correct ones, which requires the machine to judge their correctness in the context. The proposed dataset contains over 100K blanks (questions) within over 10K passages, which was originated from Chinese narrative stories. To evaluate the dataset, we implement several baseline systems based on the pre-trained models, and the results show that the state-of-the-art model still underperforms human performance by a large margin. We release the dataset and baseline system to further facilitate our community. Resources available through https://github.com/ymcui/cmrc2019

pdf bib
Conversational Word Embedding for Retrieval-Based Dialog System
Wentao Ma | Yiming Cui | Ting Liu | Dong Wang | Shijin Wang | Guoping Hu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Human conversations contain many types of information, e.g., knowledge, common sense, and language habits. In this paper, we propose a conversational word embedding method named PR-Embedding, which utilizes the conversation pairs <post, reply> to learn word embedding. Different from previous works, PR-Embedding uses the vectors from two different semantic spaces to represent the words in post and reply.To catch the information among the pair, we first introduce the word alignment model from statistical machine translation to generate the cross-sentence window, then train the embedding on word-level and sentence-level.We evaluate the method on single-turn and multi-turn response selection tasks for retrieval-based dialog systems.The experiment results show that PR-Embedding can improve the quality of the selected response.

pdf bib
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing
Ziqing Yang | Yiming Cui | Zhipeng Chen | Wanxiang Che | Ting Liu | Shijin Wang | Guoping Hu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

In this paper, we introduce TextBrewer, an open-source knowledge distillation toolkit designed for natural language processing. It works with different neural network models and supports various kinds of supervised learning tasks, such as text classification, reading comprehension, sequence labeling. TextBrewer provides a simple and uniform workflow that enables quick setting up of distillation experiments with highly flexible configurations. It offers a set of predefined distillation methods and can be extended with custom code. As a case study, we use TextBrewer to distill BERT on several typical NLP tasks. With simple configurations, we achieve results that are comparable with or even higher than the public distilled BERT models with similar numbers of parameters.

pdf bib
Is Graph Structure Necessary for Multi-hop Question Answering?
Nan Shao | Yiming Cui | Ting Liu | Shijin Wang | Guoping Hu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Recently, attempting to model texts as graph structure and introducing graph neural networks to deal with it has become a trend in many NLP research areas. In this paper, we investigate whether the graph structure is necessary for textual multi-hop reasoning. Our analysis is centered on HotpotQA. We construct a strong baseline model to establish that, with the proper use of pre-trained models, graph structure may not be necessary for textual multi-hop reasoning. We point out that both graph structure and adjacency matrix are task-related prior knowledge, and graph-attention can be considered as a special case of self-attention. Experiments demonstrate that graph-attention or the entire graph structure can be replaced by self-attention or Transformers.

pdf bib
Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting
Sanyuan Chen | Yutai Hou | Yiming Cui | Wanxiang Che | Ting Liu | Xiangzhan Yu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Deep pretrained language models have achieved great success in the way of pretraining first and then fine-tuning. But such a sequential transfer learning paradigm often confronts the catastrophic forgetting problem and leads to sub-optimal performance. To fine-tune with less forgetting, we propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks. Specifically, we introduce a Pretraining Simulation mechanism to recall the knowledge from pretraining tasks without data, and an Objective Shifting mechanism to focus the learning on downstream tasks gradually. Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark. Our method also enables BERT-base to achieve better average performance than directly fine-tuning of BERT-large. Further, we provide the open-source RecAdam optimizer, which integrates the proposed mechanisms into Adam optimizer, to facility the NLP community.

pdf bib
Revisiting Pre-Trained Models for Chinese Natural Language Processing
Yiming Cui | Wanxiang Che | Ting Liu | Bing Qin | Shijin Wang | Guoping Hu
Findings of the Association for Computational Linguistics: EMNLP 2020

Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we target on revisiting Chinese pre-trained language models to examine their effectiveness in a non-English language and release the Chinese pre-trained language model series to the community. We also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways, especially the masking strategy that adopts MLM as correction (Mac). We carried out extensive experiments on eight Chinese NLP tasks to revisit the existing pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. https://github.com/ymcui/MacBERT


pdf bib
TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-Based Chatbots
Wentao Ma | Yiming Cui | Nan Shao | Su He | Wei-Nan Zhang | Ting Liu | Shijin Wang | Guoping Hu
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We consider the importance of different utterances in the context for selecting the response usually depends on the current query. In this paper, we propose the model TripleNet to fully model the task with the triple <context, query, response> instead of <context, response > in previous works. The heart of TripleNet is a novel attention mechanism named triple attention to model the relationships within the triple at four levels. The new mechanism updates the representation of each element based on the attention with the other two concurrently and symmetrically.We match the triple <C, Q, R> centered on the response from char to context level for prediction.Experimental results on two large-scale multi-turn response selection datasets show that the proposed model can significantly outperform the state-of-the-art methods.

pdf bib
Cross-Lingual Machine Reading Comprehension
Yiming Cui | Wanxiang Che | Ting Liu | Bing Qin | Shijin Wang | Guoping Hu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Though the community has made great progress on Machine Reading Comprehension (MRC) task, most of the previous works are solving English-based MRC problems, and there are few efforts on other languages mainly due to the lack of large-scale training data.In this paper, we propose Cross-Lingual Machine Reading Comprehension (CLMRC) task for the languages other than English. Firstly, we present several back-translation approaches for CLMRC task which is straightforward to adopt. However, to exactly align the answer into source language is difficult and could introduce additional noise. In this context, we propose a novel model called Dual BERT, which takes advantage of the large-scale training data provided by rich-resource language (such as English) and learn the semantic relations between the passage and question in bilingual context, and then utilize the learned knowledge to improve reading comprehension performance of low-resource language. We conduct experiments on two Chinese machine reading comprehension datasets CMRC 2018 and DRCD. The results show consistent and significant improvements over various state-of-the-art systems by a large margin, which demonstrate the potentials in CLMRC task. Resources available: https://github.com/ymcui/Cross-Lingual-MRC

pdf bib
A Span-Extraction Dataset for Chinese Machine Reading Comprehension
Yiming Cui | Ting Liu | Wanxiang Che | Li Xiao | Zhipeng Chen | Wentao Ma | Shijin Wang | Guoping Hu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attention. However, the existing reading comprehension datasets are mostly in English. In this paper, we introduce a Span-Extraction dataset for Chinese machine reading comprehension to add language diversities in this area. The dataset is composed by near 20,000 real questions annotated on Wikipedia paragraphs by human experts. We also annotated a challenge set which contains the questions that need comprehensive understanding and multi-sentence inference throughout the context. We present several baseline systems as well as anonymous submissions for demonstrating the difficulties in this dataset. With the release of the dataset, we hosted the Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018). We hope the release of the dataset could further accelerate the Chinese machine reading comprehension research. Resources are available: https://github.com/ymcui/cmrc2018


pdf bib
Dataset for the First Evaluation on Chinese Machine Reading Comprehension
Yiming Cui | Ting Liu | Zhipeng Chen | Wentao Ma | Shijin Wang | Guoping Hu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Context-Sensitive Generation of Open-Domain Conversational Responses
Weinan Zhang | Yiming Cui | Yifa Wang | Qingfu Zhu | Lingzhi Li | Lianqiang Zhou | Ting Liu
Proceedings of the 27th International Conference on Computational Linguistics

Despite the success of existing works on single-turn conversation generation, taking the coherence in consideration, human conversing is actually a context-sensitive process. Inspired by the existing studies, this paper proposed the static and dynamic attention based approaches for context-sensitive generation of open-domain conversational responses. Experimental results on two public datasets show that the proposed static attention based approach outperforms all the baselines on automatic and human evaluation.


pdf bib
Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution
Ting Liu | Yiming Cui | Qingyu Yin | Wei-Nan Zhang | Shijin Wang | Guoping Hu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Most existing approaches for zero pronoun resolution are heavily relying on annotated data, which is often released by shared task organizers. Therefore, the lack of annotated data becomes a major obstacle in the progress of zero pronoun resolution task. Also, it is expensive to spend manpower on labeling the data for better performance. To alleviate the problem above, in this paper, we propose a simple but novel approach to automatically generate large-scale pseudo training data for zero pronoun resolution. Furthermore, we successfully transfer the cloze-style reading comprehension neural network model into zero pronoun resolution task and propose a two-step training mechanism to overcome the gap between the pseudo training data and the real one. Experimental results show that the proposed approach significantly outperforms the state-of-the-art systems with an absolute improvements of 3.1% F-score on OntoNotes 5.0 data.

pdf bib
Attention-over-Attention Neural Networks for Reading Comprehension
Yiming Cui | Zhipeng Chen | Si Wei | Shijin Wang | Ting Liu | Guoping Hu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cloze-style reading comprehension is a representative problem in mining relationship between document and query. In this paper, we present a simple but novel model called attention-over-attention reader for better solving cloze-style reading comprehension task. The proposed model aims to place another attention mechanism over the document-level attention and induces “attended attention” for final answer predictions. One advantage of our model is that it is simpler than related works while giving excellent performance. In addition to the primary model, we also propose an N-best re-ranking strategy to double check the validity of the candidates and further improve the performance. Experimental results show that the proposed methods significantly outperform various state-of-the-art systems by a large margin in public datasets, such as CNN and Children’s Book Test.


pdf bib
LSTM Neural Reordering Feature for Statistical Machine Translation
Yiming Cui | Shijin Wang | Jianfeng Li
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Consensus Attention-based Neural Networks for Chinese Reading Comprehension
Yiming Cui | Ting Liu | Zhipeng Chen | Shijin Wang | Guoping Hu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Reading comprehension has embraced a booming in recent NLP research. Several institutes have released the Cloze-style reading comprehension data, and these have greatly accelerated the research of machine comprehension. In this work, we firstly present Chinese reading comprehension datasets, which consist of People Daily news dataset and Children’s Fairy Tale (CFT) dataset. Also, we propose a consensus attention-based neural network architecture to tackle the Cloze-style reading comprehension problem, which aims to induce a consensus attention over every words in the query. Experimental results show that the proposed neural network significantly outperforms the state-of-the-art baselines in several public datasets. Furthermore, we setup a baseline for Chinese reading comprehension task, and hopefully this would speed up the process for future research.


pdf bib
The USTC machine translation system for IWSLT 2014
Shijin Wang | Yuguang Wang | Jianfeng Li | Yiming Cui | Lirong Dai
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign


pdf bib
The HIT-LTRC machine translation system for IWSLT 2012
Xiaoning Zhu | Yiming Cui | Conghui Zhu | Tiejun Zhao | Hailong Cao
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we describe HIT-LTRC's participation in the IWSLT 2012 evaluation campaign. In this year, we took part in the Olympics Task which required the participants to translate Chinese to English with limited data. Our system is based on Moses[1], which is an open source machine translation system. We mainly used the phrase-based models to carry out our experiments, and factored-based models were also performed in comparison. All the involved tools are freely available. In the evaluation campaign, we focus on data selection, phrase extraction method comparison and phrase table combination.