Mu Li


2023

pdf bib
Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation
Yuxin Ren | Zihan Zhong | Xingjian Shi | Yi Zhu | Chun Yuan | Mu Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

It has been commonly observed that a teacher model with superior performance does not necessarily result in a stronger student, highlighting a discrepancy between current teacher training practices and effective knowledge transfer. In order to enhance the guidance of the teacher training process, we introduce the concept of distillation influence to determine the impact of distillation from each training sample on the student’s generalization ability. In this paper, we propose Learning Good Teacher Matters (LGTM), an efficient training technique for incorporating distillation influence into the teacher’s learning process. By prioritizing samples that are likely to enhance the student’s generalization ability, our LGTM outperforms 10 common knowledge distillation baselines on 6 text classification tasks in the GLUE benchmark.

pdf bib
A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
Jiaao Chen | Aston Zhang | Mu Li | Alex Smola | Diyi Yang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have some limitations in modeling discrete data, e.g., languages. For example, the generally used Gaussian noise can not handle the discrete corruption well, and the objectives in continuous spaces fail to be stable for textual data in the diffusion process especially when the dimension is high. To alleviate these issues, we introduce a novel diffusion model for language modeling, Masked-Diffuse LM, with lower training cost and better performances, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. Also, we directly predict the categorical distribution with cross-entropy loss function in every diffusion step to connect the continuous space and discrete space in a more efficient and straightforward way. Through experiments on 5 controlled generation tasks, we demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.

2022

pdf bib
Modeling Multi-Granularity Hierarchical Features for Relation Extraction
Xinnian Liang | Shuangzhi Wu | Mu Li | Zhoujun Li
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Relation extraction is a key task in Natural Language Processing (NLP), which aims to extract relations between entity pairs from given texts. Recently, relation extraction (RE) has achieved remarkable progress with the development of deep neural networks. Most existing research focuses on constructing explicit structured features using external knowledge such as knowledge graph and dependency tree. In this paper, we propose a novel method to extract multi-granularity features based solely on the original input sentences. We show that effective structured features can be attained even without external knowledge. Three kinds of features based on the input sentences are fully exploited, which are in entity mention level, segment level, and sentence level. All the three are jointly and hierarchically modeled. We evaluate our method on three public benchmarks: SemEval 2010 Task 8, Tacred, and Tacred Revisited. To verify the effectiveness, we apply our method to different encoders such as LSTM and BERT. Experimental results show that our method significantly outperforms existing state-of-the-art models that even use external knowledge. Extensive analyses demonstrate that the performance of our model is contributed by the capture of multi-granularity features and the model of their hierarchical structure.

pdf bib
Learning Confidence for Transformer-based Neural Machine Translation
Yu Lu | Jiali Zeng | Jiajun Zhang | Shuangzhi Wu | Mu Li
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Confidence estimation aims to quantify the confidence of the model prediction, providing an expectation of success. A well-calibrated confidence estimate enables accurate failure prediction and proper risk measurement when given noisy samples and out-of-distribution data in real-world settings. However, this task remains a severe challenge for neural machine translation (NMT), where probabilities from softmax distribution fail to describe when the model is probably mistaken. To address this problem, we propose an unsupervised confidence estimate learning jointly with the training of the NMT model. We explain confidence as how many hints the NMT model needs to make a correct prediction, and more hints indicate low confidence. Specifically, the NMT model is given the option to ask for hints to improve translation accuracy at the cost of some slight penalty. Then, we approximate their level of confidence by counting the number of hints the model uses. We demonstrate that our learned confidence estimate achieves high accuracy on extensive sentence/word-level quality estimation tasks. Analytical results verify that our confidence estimate can correctly assess underlying risk in two real-world scenarios: (1) discovering noisy samples and (2) detecting out-of-domain data. We further propose a novel confidence-based instance-specific label smoothing approach based on our learned confidence estimate, which outperforms standard label smoothing.

pdf bib
Task-guided Disentangled Tuning for Pretrained Language Models
Jiali Zeng | Yufan Jiang | Shuangzhi Wu | Yongjing Yin | Mu Li
Findings of the Association for Computational Linguistics: ACL 2022

Pretrained language models (PLMs) trained on large-scale unlabeled corpus are typically fine-tuned on task-specific downstream datasets, which have produced state-of-the-art results on various NLP tasks. However, the data discrepancy issue in domain and scale makes fine-tuning fail to efficiently capture task-specific patterns, especially in low data regime. To address this issue, we propose Task-guided Disentangled Tuning (TDT) for PLMs, which enhances the generalization of representations by disentangling task-relevant signals from the entangled representations. For a given task, we introduce a learnable confidence model to detect indicative guidance from context, and further propose a disentangled regularization to mitigate the over-reliance problem. Experimental results on GLUE and CLUE benchmarks show that TDT gives consistently better results than fine-tuning with different PLMs, and extensive analysis demonstrates the effectiveness and robustness of our method. Code is available at https://github.com/lemon0830/TDT.

pdf bib
An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization Framework Based on Semantic Blocks
Xinnian Liang | Jing Li | Shuangzhi Wu | Jiali Zeng | Yufan Jiang | Mu Li | Zhoujun Li
Proceedings of the 29th International Conference on Computational Linguistics

Unsupervised summarization methods have achieved remarkable results by incorporating representations from pre-trained language models. However, existing methods fail to consider efficiency and effectiveness at the same time when the input document is extremely long. To tackle this problem, in this paper, we proposed an efficient Coarse-to-Fine Facet-Aware Ranking (C2F-FAR) framework for unsupervised long document summarization, which is based on the semantic block. The semantic block refers to continuous sentences in the document that describe the same facet. Specifically, we address this problem by converting the one-step ranking method into the hierarchical multi-granularity two-stage ranking. In the coarse-level stage, we proposed a new segment algorithm to split the document into facet-aware semantic blocks and then filter insignificant blocks. In the fine-level stage, we select salient sentences in each block and then extract the final summary from selected sentences. We evaluate our framework on four long document summarization datasets: Gov-Report, BillSum, arXiv, and PubMed. Our C2F-FAR can achieve new state-of-the-art unsupervised summarization results on Gov-Report and BillSum. In addition, our method speeds up 4-28 times more than previous methods.

2021

pdf bib
Attention Calibration for Transformer in Neural Machine Translation
Yu Lu | Jiali Zeng | Jiajun Zhang | Shuangzhi Wu | Mu Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Attention mechanisms have achieved substantial improvements in neural machine translation by dynamically selecting relevant inputs for different predictions. However, recent studies have questioned the attention mechanisms’ capability for discovering decisive inputs. In this paper, we propose to calibrate the attention weights by introducing a mask perturbation model that automatically evaluates each input’s contribution to the model outputs. We increase the attention weights assigned to the indispensable tokens, whose removal leads to a dramatic performance decrease. The extensive experiments on the Transformer-based translation have demonstrated the effectiveness of our model. We further find that the calibrated attention weights are more uniform at lower layers to collect multiple information while more concentrated on the specific inputs at higher layers. Detailed analyses also show a great need for calibration in the attention weights with high entropy where the model is unconfident about its decision.

pdf bib
Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing
Haoyu He | Xingjian Shi | Jonas Mueller | Sheng Zha | Mu Li | George Karypis
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing

Knowledge Distillation (KD) offers a natural way to reduce the latency and memory/energy usage of massive pretrained models that have come to dominate Natural Language Processing (NLP) in recent years. While numerous sophisticated variants of KD algorithms have been proposed for NLP applications, the key factors underpinning the optimal distillation performance are often confounded and remain unclear. We aim to identify how different components in the KD pipeline affect the resulting performance and how much the optimal KD pipeline varies across different datasets/tasks, such as the data augmentation policy, the loss function, and the intermediate representation for transferring the knowledge between teacher and student. To tease apart their effects, we propose Distiller, a meta KD framework that systematically combines a broad range of techniques across different stages of the KD pipeline, which enables us to quantify each component’s contribution. Within Distiller, we unify commonly used objectives for distillation of intermediate representations under a universal mutual information (MI) objective and propose a class of MI-objective functions with better bias/variance trade-off for estimating the MI between the teacher and the student. On a diverse set of NLP datasets, the best Distiller configurations are identified via large-scale hyper-parameter optimization. Our experiments reveal the following: 1) the approach used to distill the intermediate representations is the most important factor in KD performance, 2) among different objectives for intermediate distillation, MI-performs the best, and 3) data augmentation provides a large boost for small training datasets or small student networks. Moreover, we find that different datasets/tasks prefer different KD algorithms, and thus propose a simple AutoDistiller algorithm that can recommend a good KD pipeline for a new dataset.

pdf bib
Tencent Translation System for the WMT21 News Translation Task
Longyue Wang | Mu Li | Fangxu Liu | Shuming Shi | Zhaopeng Tu | Xing Wang | Shuangzhi Wu | Jiali Zeng | Wen Zhang
Proceedings of the Sixth Conference on Machine Translation

This paper describes Tencent Translation systems for the WMT21 shared task. We participate in the news translation task on three language pairs: Chinese-English, English-Chinese and German-English. Our systems are built on various Transformer models with novel techniques adapted from our recent research work. First, we combine different data augmentation methods including back-translation, forward-translation and right-to-left training to enlarge the training data. We also apply language coverage bias, data rejuvenation and uncertainty-based sampling approaches to select content-relevant and high-quality data from large parallel and monolingual corpora. Expect for in-domain fine-tuning, we also propose a fine-grained “one model one domain” approach to model characteristics of different news genres at fine-tuning and decoding stages. Besides, we use greed-based ensemble algorithm and transductive ensemble method to further boost our systems. Based on our success in the last WMT, we continuously employed advanced techniques such as large batch training, data selection and data filtering. Finally, our constrained Chinese-English system achieves 33.4 case-sensitive BLEU score, which is the highest among all submissions. The German-English system is ranked at second place accordingly.

pdf bib
Improving Unsupervised Extractive Summarization with Facet-Aware Modeling
Xinnian Liang | Shuangzhi Wu | Mu Li | Zhoujun Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context
Xinnian Liang | Shuangzhi Wu | Mu Li | Zhoujun Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Embedding based methods are widely used for unsupervised keyphrase extraction (UKE) tasks. Generally, these methods simply calculate similarities between phrase embeddings and document embedding, which is insufficient to capture different context for a more effective UKE model. In this paper, we propose a novel method for UKE, where local and global contexts are jointly modeled. From a global view, we calculate the similarity between a certain phrase and the whole document in the vector space as transitional embedding based models do. In terms of the local view, we first build a graph structure based on the document where phrases are regarded as vertices and the edges are similarities between vertices. Then, we proposed a new centrality computation method to capture local salient information based on the graph structure. Finally, we further combine the modeling of global and local context for ranking. We evaluate our models on three public benchmarks (Inspec, DUC 2001, SemEval 2010) and compare with existing state-of-the-art models. The results show that our model outperforms most models while generalizing better on input documents with different domains and length. Additional ablation study shows that both the local and global information is crucial for unsupervised keyphrase extraction tasks.

pdf bib
Recurrent Attention for Neural Machine Translation
Jiali Zeng | Shuangzhi Wu | Yongjing Yin | Yufan Jiang | Mu Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute mechanism for self-attention: Recurrent AtteNtion (RAN) . RAN directly learns attention weights without any token-to-token interaction and further improves their capacity by layer-to-layer interaction. Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time. Particularly, when apply RAN to the decoder of Transformer, there brings consistent improvements by about +0.5 BLEU on 6 translation tasks and +1.0 BLEU on Turkish-English translation task. In addition, we conduct extensive analysis on the attention weights of RAN to confirm their reasonableness. Our RAN is a promising alternative to build more effective and efficient NMT models.

2020

pdf bib
Emotion Classification by Jointly Learning to Lexiconize and Classify
Deyu Zhou | Shuangzhi Wu | Qing Wang | Jun Xie | Zhaopeng Tu | Mu Li
Proceedings of the 28th International Conference on Computational Linguistics

Emotion lexicons have been shown effective for emotion classification (Baziotis et al., 2018). Previous studies handle emotion lexicon construction and emotion classification separately. In this paper, we propose an emotional network (EmNet) to jointly learn sentence emotions and construct emotion lexicons which are dynamically adapted to a given context. The dynamic emotion lexicons are useful for handling words with multiple emotions based on different context, which can effectively improve the classification accuracy. We validate the approach on two representative architectures – LSTM and BERT, demonstrating its superiority on identifying emotions in Tweets. Our model outperforms several approaches proposed in previous studies and achieves new state-of-the-art on the benchmark Twitter dataset.

pdf bib
Tencent Neural Machine Translation Systems for the WMT20 News Translation Task
Shuangzhi Wu | Xing Wang | Longyue Wang | Fangxu Liu | Jun Xie | Zhaopeng Tu | Shuming Shi | Mu Li
Proceedings of the Fifth Conference on Machine Translation

This paper describes Tencent Neural Machine Translation systems for the WMT 2020 news translation tasks. We participate in the shared news translation task on English Chinese and English German language pairs. Our systems are built on deep Transformer and several data augmentation methods. We propose a boosted in-domain finetuning method to improve single models. Ensemble is used to combine single models and we propose an iterative transductive ensemble method which can further improve the translation performance based on the ensemble results. We achieve a BLEU score of 36.8 and the highest chrF score of 0.648 on Chinese English task.

2018

pdf bib
Generative Bridging Network for Neural Sequence Prediction
Wenhu Chen | Guanlin Li | Shuo Ren | Shujie Liu | Zhirui Zhang | Mu Li | Ming Zhou
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network). Unlike MLE directly maximizing the conditional likelihood, the bridge extends the point-wise ground truth to a bridge distribution conditioned on it, and the generator is optimized to minimize their KL-divergence. Three different GBNs, namely uniform GBN, language-model GBN and coaching GBN, are proposed to penalize confidence, enhance language smoothness and relieve learning burden. Experiments conducted on two recognized sequence prediction tasks (machine translation and abstractive text summarization) show that our proposed GBNs can yield significant improvements over strong baselines. Furthermore, by analyzing samples drawn from different bridges, expected influences on the generator are verified.

pdf bib
Bidirectional Generative Adversarial Networks for Neural Machine Translation
Zhirui Zhang | Shujie Liu | Mu Li | Ming Zhou | Enhong Chen
Proceedings of the 22nd Conference on Computational Natural Language Learning

Generative Adversarial Network (GAN) has been proposed to tackle the exposure bias problem of Neural Machine Translation (NMT). However, the discriminator typically results in the instability of the GAN training due to the inadequate training problem: the search space is so huge that sampled translations are not sufficient for discriminator training. To address this issue and stabilize the GAN training, in this paper, we propose a novel Bidirectional Generative Adversarial Network for Neural Machine Translation (BGAN-NMT), which aims to introduce a generator model to act as the discriminator, whereby the discriminator naturally considers the entire translation space so that the inadequate training problem can be alleviated. To satisfy this property, generator and discriminator are both designed to model the joint probability of sentence pairs, with the difference that, the generator decomposes the joint probability with a source language model and a source-to-target translation model, while the discriminator is formulated as a target language model and a target-to-source translation model. To further leverage the symmetry of them, an auxiliary GAN is introduced and adopts generator and discriminator models of original one as its own discriminator and generator respectively. Two GANs are alternately trained to update the parameters. Experiment results on German-English and Chinese-English translation tasks demonstrate that our method not only stabilizes GAN training but also achieves significant improvements over baseline systems.

pdf bib
Triangular Architecture for Rare Language Translation
Shuo Ren | Wenhu Chen | Shujie Liu | Mu Li | Ming Zhou | Shuai Ma
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural Machine Translation (NMT) performs poor on the low-resource language pair (X,Z), especially when Z is a rare language. By introducing another rich language Y, we propose a novel triangular training architecture (TA-NMT) to leverage bilingual data (Y,Z) (may be small) and (X,Y) (can be rich) to improve the translation performance of low-resource pairs. In this triangular architecture, Z is taken as the intermediate latent variable, and translation models of Z are jointly optimized with an unified bidirectional EM algorithm under the goal of maximizing the translation likelihood of (X,Y). Empirical results demonstrate that our method significantly improves the translation quality of rare languages on MultiUN and IWSLT2012 datasets, and achieves even better performance combining back-translation methods.

2017

pdf bib
Stack-based Multi-layer Attention for Transition-based Dependency Parsing
Zhirui Zhang | Shujie Liu | Mu Li | Ming Zhou | Enhong Chen
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Although sequence-to-sequence (seq2seq) network has achieved significant success in many NLP tasks such as machine translation and text summarization, simply applying this approach to transition-based dependency parsing cannot yield a comparable performance gain as in other state-of-the-art methods, such as stack-LSTM and head selection. In this paper, we propose a stack-based multi-layer attention model for seq2seq learning to better leverage structural linguistics information. In our method, two binary vectors are used to track the decoding stack in transition-based parsing, and multi-layer attention is introduced to capture multiple word dependencies in partial trees. We conduct experiments on PTB and CTB datasets, and the results show that our proposed model achieves state-of-the-art accuracy and significant improvement in labeled precision with respect to the baseline seq2seq model.

pdf bib
Sequence-to-Dependency Neural Machine Translation
Shuangzhi Wu | Dongdong Zhang | Nan Yang | Mu Li | Ming Zhou
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Nowadays a typical Neural Machine Translation (NMT) model generates translations from left to right as a linear sequence, during which latent syntactic structures of the target sentences are not explicitly concerned. Inspired by the success of using syntactic knowledge of target language for improving statistical machine translation, in this paper we propose a novel Sequence-to-Dependency Neural Machine Translation (SD-NMT) method, in which the target word sequence and its corresponding dependency structure are jointly constructed and modeled, and this structure is used as context to facilitate word generations. Experimental results show that the proposed method significantly outperforms state-of-the-art baselines on Chinese-English and Japanese-English translation tasks.

pdf bib
Chunk-based Decoder for Neural Machine Translation
Shonosuke Ishiwatari | Jingtao Yao | Shujie Liu | Mu Li | Ming Zhou | Naoki Yoshinaga | Masaru Kitsuregawa | Weijia Jia
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Chunks (or phrases) once played a pivotal role in machine translation. By using a chunk rather than a word as the basic translation unit, local (intra-chunk) and global (inter-chunk) word orders and dependencies can be easily modeled. The chunk structure, despite its importance, has not been considered in the decoders used for neural machine translation (NMT). In this paper, we propose chunk-based decoders for (NMT), each of which consists of a chunk-level decoder and a word-level decoder. The chunk-level decoder models global dependencies while the word-level decoder decides the local word order in a chunk. To output a target sentence, the chunk-level decoder generates a chunk representation containing global information, which the word-level decoder then uses as a basis to predict the words inside the chunk. Experimental results show that our proposed decoders can significantly improve translation performance in a WAT ‘16 English-to-Japanese translation task.

2016

pdf bib
Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
Shi Feng | Shujie Liu | Nan Yang | Mu Li | Ming Zhou | Kenny Q. Zhu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In neural machine translation, the attention mechanism facilitates the translation process by producing a soft alignment between the source sentence and the target sentence. However, without dedicated distortion and fertility models seen in traditional SMT systems, the learned alignment may not be accurate, which can lead to low translation quality. In this paper, we propose two novel models to improve attention-based neural machine translation. We propose a recurrent attention mechanism as an implicit distortion model, and a fertility conditioned decoder as an implicit fertility model. We conduct experiments on large-scale Chinese–English translation tasks. The results show that our models significantly improve both the alignment and translation quality compared to the original attention mechanism and several other variations.

pdf bib
Knowledge-Based Semantic Embedding for Machine Translation
Chen Shi | Shujie Liu | Shuo Ren | Shi Feng | Mu Li | Ming Zhou | Xu Sun | Houfeng Wang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Hierarchical Recurrent Neural Network for Document Modeling
Rui Lin | Shujie Liu | Muyun Yang | Mu Li | Ming Zhou | Sheng Li
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Bilingually-constrained Phrase Embeddings for Machine Translation
Jiajun Zhang | Shujie Liu | Mu Li | Ming Zhou | Chengqing Zong
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning Topic Representation for SMT with Neural Networks
Lei Cui | Dongdong Zhang | Shujie Liu | Qiming Chen | Mu Li | Ming Zhou | Muyun Yang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Recursive Recurrent Neural Network for Statistical Machine Translation
Shujie Liu | Nan Yang | Mu Li | Ming Zhou
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Lexicalized Reordering Model for Hierarchical Phrase-based Translation
Hailong Cao | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Efficient Collective Entity Linking with Stacking
Zhengyan He | Shujie Liu | Yang Song | Mu Li | Ming Zhou | Houfeng Wang
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multi-Domain Adaptation for SMT Using Multi-Task Learning
Lei Cui | Xilun Chen | Dongdong Zhang | Shujie Liu | Mu Li | Ming Zhou
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Word Alignment Modeling with Context Dependent Deep Neural Network
Nan Yang | Shujie Liu | Mu Li | Ming Zhou | Nenghai Yu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Punctuation Prediction with Transition-based Parsing
Dongdong Zhang | Shuangzhi Wu | Nan Yang | Mu Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning Entity Representation for Entity Disambiguation
Zhengyan He | Shujie Liu | Mu Li | Ming Zhou | Longkai Zhang | Houfeng Wang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Bilingual Data Cleaning for SMT using Graph-based Random Walk
Lei Cui | Dongdong Zhang | Shujie Liu | Mu Li | Ming Zhou
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Learning Translation Consensus with Structured Label Propagation
Shujie Liu | Chi-Ho Li | Mu Li | Ming Zhou
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Ranking-based Approach to Word Reordering for Statistical Machine Translation
Nan Yang | Mu Li | Dongdong Zhang | Nenghai Yu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Hierarchical Chunk-to-String Translation
Yang Feng | Dongdong Zhang | Mu Li | Qun Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation
Seung-Wook Lee | Dongdong Zhang | Mu Li | Ming Zhou | Hae-Chang Rim
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Forced Derivation Tree based Model Training to Statistical Machine Translation
Nan Duan | Mu Li | Ming Zhou
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Re-training Monolingual Parser Bilingually for Syntactic SMT
Shujie Liu | Chi-Ho Li | Mu Li | Ming Zhou
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Hypothesis Mixture Decoding for Statistical Machine Translation
Nan Duan | Mu Li | Ming Zhou
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Function Word Generation in Statistical Machine Translation Systems
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Improving Phrase Extraction via MBR Phrase Scoring and Pruning
Nan Duan | Mu Li | Ming Zhou | Lei Cui
Proceedings of Machine Translation Summit XIII: Papers

2010

pdf bib
A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Mixture Model-based Minimum Bayes Risk Decoding using Multiple Machine Translation Systems
Nan Duan | Mu Li | Dongdong Zhang | Ming Zhou
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation
Mu Li | Yinggong Zhao | Dongdong Zhang | Ming Zhou
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Hybrid Decoding: Decoding with Partial Hypotheses Combination over Multiple SMT Systems
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Coling 2010: Posters

2009

pdf bib
Better Synchronous Binarization for Machine Translation
Tong Xiao | Mu Li | Dongdong Zhang | Jingbo Zhu | Ming Zhou
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
The Feature Subspace Method for SMT System Combination
Nan Duan | Mu Li | Tong Xiao | Ming Zhou
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Extracting Keyphrases from Chinese News Articles Using TextRank and Query Log Knowledge
Weiming Liang | Chang-Ning Huang | Mu Li | Bao-Liang Lu
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders
Mu Li | Nan Duan | Dongdong Zhang | Chi-Ho Li | Ming Zhou
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Introduction to China’s CWMT2008 Machine Translation Evaluation
Hongmei Zhao | Jun Xie | Qun Liu | Yajuan Lü | Dongdong Zhang | Mu Li
Proceedings of Machine Translation Summit XII: Papers

2008

pdf bib
An Empirical Study in Source Word Deletion for Phrase-Based Statistical Machine Translation
Chi-Ho Li | Hailei Zhang | Dongdong Zhang | Mu Li | Ming Zhou
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Measure Word Generation for English-Chinese SMT Systems
Dongdong Zhang | Mu Li | Nan Duan | Chi-Ho Li | Ming Zhou
Proceedings of ACL-08: HLT

pdf bib
Diagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points
Ming Zhou | Bo Wang | Shujie Liu | Mu Li | Dongdong Zhang | Tiejun Zhao
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation
Chi-Ho Li | Minghui Li | Dongdong Zhang | Mu Li | Ming Zhou | Yi Guan
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Improving Query Spelling Correction Using Web Search Results
Qing Chen | Mu Li | Ming Zhou
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Phrase Reordering Model Integrating Syntactic Knowledge for SMT
Dongdong Zhang | Mu Li | Chi-Ho Li | Ming Zhou
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Exploring Distributional Similarity Based Models for Query Spelling Correction
Mu Li | Muhua Zhu | Yang Zhang | Ming Zhou
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
An Improved Chinese Word Segmentation System with Conditional Random Field
Hai Zhao | Chang-Ning Huang | Mu Li
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

pdf bib
Discriminative Reranking for Spelling Correction
Yang Zhang | Pilian He | Wei Xiang | Mu Li
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

pdf bib
Effective Tag Set Selection in Chinese Word Segmentation via Conditional Random Field Modeling
Hai Zhao | Chang-Ning Huang | Mu Li | Bao-Liang Lu
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

2005

pdf bib
Detecting Segmentation Errors in Chinese Annotated Corpus
Chengjie Sun | Chang-Ning Huang | Xiaolong Wang | Mu Li
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing

pdf bib
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Jianfeng Gao | Mu Li | Andi Wu | Chang-Ning Huang
Computational Linguistics, Volume 31, Number 4, December 2005

2004

pdf bib
Adaptive Chinese Word Segmentation
Jianfeng Gao | Andi Wu | Mu Li | Chang-Ning Huang | Hongqiao Li | Xinsong Xia | Haowei Qin
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Improved Source-Channel Models for Chinese Word Segmentation
Jianfeng Gao | Mu Li | Chang-Ning Huang
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Unsupervised Training for Overlapping Ambiguity Resolution in Chinese Word Segmentation
Mu Li | Jianfeng Gao | Chang-Ning Huang | Jianfeng Li
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

pdf bib
Single Character Chinese Named Entity Recognition
Xiaodan Zhu | Mu Li | Jianfeng Gao | Chang-Ning Huang
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing