Mu Li


2021

pdf bib
Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context
Xinnian Liang | Shuangzhi Wu | Mu Li | Zhoujun Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Embedding based methods are widely used for unsupervised keyphrase extraction (UKE) tasks. Generally, these methods simply calculate similarities between phrase embeddings and document embedding, which is insufficient to capture different context for a more effective UKE model. In this paper, we propose a novel method for UKE, where local and global contexts are jointly modeled. From a global view, we calculate the similarity between a certain phrase and the whole document in the vector space as transitional embedding based models do. In terms of the local view, we first build a graph structure based on the document where phrases are regarded as vertices and the edges are similarities between vertices. Then, we proposed a new centrality computation method to capture local salient information based on the graph structure. Finally, we further combine the modeling of global and local context for ranking. We evaluate our models on three public benchmarks (Inspec, DUC 2001, SemEval 2010) and compare with existing state-of-the-art models. The results show that our model outperforms most models while generalizing better on input documents with different domains and length. Additional ablation study shows that both the local and global information is crucial for unsupervised keyphrase extraction tasks.

pdf bib
Recurrent Attention for Neural Machine Translation
Jiali Zeng | Shuangzhi Wu | Yongjing Yin | Yufan Jiang | Mu Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute mechanism for self-attention: Recurrent AtteNtion (RAN) . RAN directly learns attention weights without any token-to-token interaction and further improves their capacity by layer-to-layer interaction. Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time. Particularly, when apply RAN to the decoder of Transformer, there brings consistent improvements by about +0.5 BLEU on 6 translation tasks and +1.0 BLEU on Turkish-English translation task. In addition, we conduct extensive analysis on the attention weights of RAN to confirm their reasonableness. Our RAN is a promising alternative to build more effective and efficient NMT models.

pdf bib
Improving Unsupervised Extractive Summarization with Facet-Aware Modeling
Xinnian Liang | Shuangzhi Wu | Mu Li | Zhoujun Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing
Haoyu He | Xingjian Shi | Jonas Mueller | Sheng Zha | Mu Li | George Karypis
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing

Knowledge Distillation (KD) offers a natural way to reduce the latency and memory/energy usage of massive pretrained models that have come to dominate Natural Language Processing (NLP) in recent years. While numerous sophisticated variants of KD algorithms have been proposed for NLP applications, the key factors underpinning the optimal distillation performance are often confounded and remain unclear. We aim to identify how different components in the KD pipeline affect the resulting performance and how much the optimal KD pipeline varies across different datasets/tasks, such as the data augmentation policy, the loss function, and the intermediate representation for transferring the knowledge between teacher and student. To tease apart their effects, we propose Distiller, a meta KD framework that systematically combines a broad range of techniques across different stages of the KD pipeline, which enables us to quantify each component’s contribution. Within Distiller, we unify commonly used objectives for distillation of intermediate representations under a universal mutual information (MI) objective and propose a class of MI-objective functions with better bias/variance trade-off for estimating the MI between the teacher and the student. On a diverse set of NLP datasets, the best Distiller configurations are identified via large-scale hyper-parameter optimization. Our experiments reveal the following: 1) the approach used to distill the intermediate representations is the most important factor in KD performance, 2) among different objectives for intermediate distillation, MI-performs the best, and 3) data augmentation provides a large boost for small training datasets or small student networks. Moreover, we find that different datasets/tasks prefer different KD algorithms, and thus propose a simple AutoDistiller algorithm that can recommend a good KD pipeline for a new dataset.

pdf bib
Attention Calibration for Transformer in Neural Machine Translation
Yu Lu | Jiali Zeng | Jiajun Zhang | Shuangzhi Wu | Mu Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Attention mechanisms have achieved substantial improvements in neural machine translation by dynamically selecting relevant inputs for different predictions. However, recent studies have questioned the attention mechanisms’ capability for discovering decisive inputs. In this paper, we propose to calibrate the attention weights by introducing a mask perturbation model that automatically evaluates each input’s contribution to the model outputs. We increase the attention weights assigned to the indispensable tokens, whose removal leads to a dramatic performance decrease. The extensive experiments on the Transformer-based translation have demonstrated the effectiveness of our model. We further find that the calibrated attention weights are more uniform at lower layers to collect multiple information while more concentrated on the specific inputs at higher layers. Detailed analyses also show a great need for calibration in the attention weights with high entropy where the model is unconfident about its decision.

2020

pdf bib
Tencent Neural Machine Translation Systems for the WMT20 News Translation Task
Shuangzhi Wu | Xing Wang | Longyue Wang | Fangxu Liu | Jun Xie | Zhaopeng Tu | Shuming Shi | Mu Li
Proceedings of the Fifth Conference on Machine Translation

This paper describes Tencent Neural Machine Translation systems for the WMT 2020 news translation tasks. We participate in the shared news translation task on English Chinese and English German language pairs. Our systems are built on deep Transformer and several data augmentation methods. We propose a boosted in-domain finetuning method to improve single models. Ensemble is used to combine single models and we propose an iterative transductive ensemble method which can further improve the translation performance based on the ensemble results. We achieve a BLEU score of 36.8 and the highest chrF score of 0.648 on Chinese English task.

pdf bib
Emotion Classification by Jointly Learning to Lexiconize and Classify
Deyu Zhou | Shuangzhi Wu | Qing Wang | Jun Xie | Zhaopeng Tu | Mu Li
Proceedings of the 28th International Conference on Computational Linguistics

Emotion lexicons have been shown effective for emotion classification (Baziotis et al., 2018). Previous studies handle emotion lexicon construction and emotion classification separately. In this paper, we propose an emotional network (EmNet) to jointly learn sentence emotions and construct emotion lexicons which are dynamically adapted to a given context. The dynamic emotion lexicons are useful for handling words with multiple emotions based on different context, which can effectively improve the classification accuracy. We validate the approach on two representative architectures – LSTM and BERT, demonstrating its superiority on identifying emotions in Tweets. Our model outperforms several approaches proposed in previous studies and achieves new state-of-the-art on the benchmark Twitter dataset.

2018

pdf bib
Triangular Architecture for Rare Language Translation
Shuo Ren | Wenhu Chen | Shujie Liu | Mu Li | Ming Zhou | Shuai Ma
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural Machine Translation (NMT) performs poor on the low-resource language pair (X,Z), especially when Z is a rare language. By introducing another rich language Y, we propose a novel triangular training architecture (TA-NMT) to leverage bilingual data (Y,Z) (may be small) and (X,Y) (can be rich) to improve the translation performance of low-resource pairs. In this triangular architecture, Z is taken as the intermediate latent variable, and translation models of Z are jointly optimized with an unified bidirectional EM algorithm under the goal of maximizing the translation likelihood of (X,Y). Empirical results demonstrate that our method significantly improves the translation quality of rare languages on MultiUN and IWSLT2012 datasets, and achieves even better performance combining back-translation methods.

pdf bib
Bidirectional Generative Adversarial Networks for Neural Machine Translation
Zhirui Zhang | Shujie Liu | Mu Li | Ming Zhou | Enhong Chen
Proceedings of the 22nd Conference on Computational Natural Language Learning

Generative Adversarial Network (GAN) has been proposed to tackle the exposure bias problem of Neural Machine Translation (NMT). However, the discriminator typically results in the instability of the GAN training due to the inadequate training problem: the search space is so huge that sampled translations are not sufficient for discriminator training. To address this issue and stabilize the GAN training, in this paper, we propose a novel Bidirectional Generative Adversarial Network for Neural Machine Translation (BGAN-NMT), which aims to introduce a generator model to act as the discriminator, whereby the discriminator naturally considers the entire translation space so that the inadequate training problem can be alleviated. To satisfy this property, generator and discriminator are both designed to model the joint probability of sentence pairs, with the difference that, the generator decomposes the joint probability with a source language model and a source-to-target translation model, while the discriminator is formulated as a target language model and a target-to-source translation model. To further leverage the symmetry of them, an auxiliary GAN is introduced and adopts generator and discriminator models of original one as its own discriminator and generator respectively. Two GANs are alternately trained to update the parameters. Experiment results on German-English and Chinese-English translation tasks demonstrate that our method not only stabilizes GAN training but also achieves significant improvements over baseline systems.

pdf bib
Generative Bridging Network for Neural Sequence Prediction
Wenhu Chen | Guanlin Li | Shuo Ren | Shujie Liu | Zhirui Zhang | Mu Li | Ming Zhou
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network). Unlike MLE directly maximizing the conditional likelihood, the bridge extends the point-wise ground truth to a bridge distribution conditioned on it, and the generator is optimized to minimize their KL-divergence. Three different GBNs, namely uniform GBN, language-model GBN and coaching GBN, are proposed to penalize confidence, enhance language smoothness and relieve learning burden. Experiments conducted on two recognized sequence prediction tasks (machine translation and abstractive text summarization) show that our proposed GBNs can yield significant improvements over strong baselines. Furthermore, by analyzing samples drawn from different bridges, expected influences on the generator are verified.

2017

pdf bib
Sequence-to-Dependency Neural Machine Translation
Shuangzhi Wu | Dongdong Zhang | Nan Yang | Mu Li | Ming Zhou
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Nowadays a typical Neural Machine Translation (NMT) model generates translations from left to right as a linear sequence, during which latent syntactic structures of the target sentences are not explicitly concerned. Inspired by the success of using syntactic knowledge of target language for improving statistical machine translation, in this paper we propose a novel Sequence-to-Dependency Neural Machine Translation (SD-NMT) method, in which the target word sequence and its corresponding dependency structure are jointly constructed and modeled, and this structure is used as context to facilitate word generations. Experimental results show that the proposed method significantly outperforms state-of-the-art baselines on Chinese-English and Japanese-English translation tasks.

pdf bib
Chunk-based Decoder for Neural Machine Translation
Shonosuke Ishiwatari | Jingtao Yao | Shujie Liu | Mu Li | Ming Zhou | Naoki Yoshinaga | Masaru Kitsuregawa | Weijia Jia
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Chunks (or phrases) once played a pivotal role in machine translation. By using a chunk rather than a word as the basic translation unit, local (intra-chunk) and global (inter-chunk) word orders and dependencies can be easily modeled. The chunk structure, despite its importance, has not been considered in the decoders used for neural machine translation (NMT). In this paper, we propose chunk-based decoders for (NMT), each of which consists of a chunk-level decoder and a word-level decoder. The chunk-level decoder models global dependencies while the word-level decoder decides the local word order in a chunk. To output a target sentence, the chunk-level decoder generates a chunk representation containing global information, which the word-level decoder then uses as a basis to predict the words inside the chunk. Experimental results show that our proposed decoders can significantly improve translation performance in a WAT ‘16 English-to-Japanese translation task.

pdf bib
Stack-based Multi-layer Attention for Transition-based Dependency Parsing
Zhirui Zhang | Shujie Liu | Mu Li | Ming Zhou | Enhong Chen
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Although sequence-to-sequence (seq2seq) network has achieved significant success in many NLP tasks such as machine translation and text summarization, simply applying this approach to transition-based dependency parsing cannot yield a comparable performance gain as in other state-of-the-art methods, such as stack-LSTM and head selection. In this paper, we propose a stack-based multi-layer attention model for seq2seq learning to better leverage structural linguistics information. In our method, two binary vectors are used to track the decoding stack in transition-based parsing, and multi-layer attention is introduced to capture multiple word dependencies in partial trees. We conduct experiments on PTB and CTB datasets, and the results show that our proposed model achieves state-of-the-art accuracy and significant improvement in labeled precision with respect to the baseline seq2seq model.

2016

pdf bib
Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
Shi Feng | Shujie Liu | Nan Yang | Mu Li | Ming Zhou | Kenny Q. Zhu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In neural machine translation, the attention mechanism facilitates the translation process by producing a soft alignment between the source sentence and the target sentence. However, without dedicated distortion and fertility models seen in traditional SMT systems, the learned alignment may not be accurate, which can lead to low translation quality. In this paper, we propose two novel models to improve attention-based neural machine translation. We propose a recurrent attention mechanism as an implicit distortion model, and a fertility conditioned decoder as an implicit fertility model. We conduct experiments on large-scale Chinese–English translation tasks. The results show that our models significantly improve both the alignment and translation quality compared to the original attention mechanism and several other variations.

pdf bib
Knowledge-Based Semantic Embedding for Machine Translation
Chen Shi | Shujie Liu | Shuo Ren | Shi Feng | Mu Li | Ming Zhou | Xu Sun | Houfeng Wang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Hierarchical Recurrent Neural Network for Document Modeling
Rui Lin | Shujie Liu | Muyun Yang | Mu Li | Ming Zhou | Sheng Li
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
A Lexicalized Reordering Model for Hierarchical Phrase-based Translation
Hailong Cao | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Bilingually-constrained Phrase Embeddings for Machine Translation
Jiajun Zhang | Shujie Liu | Mu Li | Ming Zhou | Chengqing Zong
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning Topic Representation for SMT with Neural Networks
Lei Cui | Dongdong Zhang | Shujie Liu | Qiming Chen | Mu Li | Ming Zhou | Muyun Yang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Recursive Recurrent Neural Network for Statistical Machine Translation
Shujie Liu | Nan Yang | Mu Li | Ming Zhou
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Efficient Collective Entity Linking with Stacking
Zhengyan He | Shujie Liu | Yang Song | Mu Li | Ming Zhou | Houfeng Wang
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multi-Domain Adaptation for SMT Using Multi-Task Learning
Lei Cui | Xilun Chen | Dongdong Zhang | Shujie Liu | Mu Li | Ming Zhou
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Word Alignment Modeling with Context Dependent Deep Neural Network
Nan Yang | Shujie Liu | Mu Li | Ming Zhou | Nenghai Yu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Punctuation Prediction with Transition-based Parsing
Dongdong Zhang | Shuangzhi Wu | Nan Yang | Mu Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning Entity Representation for Entity Disambiguation
Zhengyan He | Shujie Liu | Mu Li | Ming Zhou | Longkai Zhang | Houfeng Wang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Bilingual Data Cleaning for SMT using Graph-based Random Walk
Lei Cui | Dongdong Zhang | Shujie Liu | Mu Li | Ming Zhou
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Learning Translation Consensus with Structured Label Propagation
Shujie Liu | Chi-Ho Li | Mu Li | Ming Zhou
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Ranking-based Approach to Word Reordering for Statistical Machine Translation
Nan Yang | Mu Li | Dongdong Zhang | Nenghai Yu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Hierarchical Chunk-to-String Translation
Yang Feng | Dongdong Zhang | Mu Li | Qun Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation
Seung-Wook Lee | Dongdong Zhang | Mu Li | Ming Zhou | Hae-Chang Rim
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Forced Derivation Tree based Model Training to Statistical Machine Translation
Nan Duan | Mu Li | Ming Zhou
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Re-training Monolingual Parser Bilingually for Syntactic SMT
Shujie Liu | Chi-Ho Li | Mu Li | Ming Zhou
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Function Word Generation in Statistical Machine Translation Systems
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Improving Phrase Extraction via MBR Phrase Scoring and Pruning
Nan Duan | Mu Li | Ming Zhou | Lei Cui
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Hypothesis Mixture Decoding for Statistical Machine Translation
Nan Duan | Mu Li | Ming Zhou
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Mixture Model-based Minimum Bayes Risk Decoding using Multiple Machine Translation Systems
Nan Duan | Mu Li | Dongdong Zhang | Ming Zhou
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation
Mu Li | Yinggong Zhao | Dongdong Zhang | Ming Zhou
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Hybrid Decoding: Decoding with Partial Hypotheses Combination over Multiple SMT Systems
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Coling 2010: Posters

2009

pdf bib
Extracting Keyphrases from Chinese News Articles Using TextRank and Query Log Knowledge
Weiming Liang | Chang-Ning Huang | Mu Li | Bao-Liang Lu
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
Better Synchronous Binarization for Machine Translation
Tong Xiao | Mu Li | Dongdong Zhang | Jingbo Zhu | Ming Zhou
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
The Feature Subspace Method for SMT System Combination
Nan Duan | Mu Li | Tong Xiao | Ming Zhou
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Introduction to China’s CWMT2008 Machine Translation Evaluation
Hongmei Zhao | Jun Xie | Qun Liu | Yajuan Lü | Dongdong Zhang | Mu Li
Proceedings of Machine Translation Summit XII: Papers

pdf bib
Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders
Mu Li | Nan Duan | Dongdong Zhang | Chi-Ho Li | Ming Zhou
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Measure Word Generation for English-Chinese SMT Systems
Dongdong Zhang | Mu Li | Nan Duan | Chi-Ho Li | Ming Zhou
Proceedings of ACL-08: HLT

pdf bib
An Empirical Study in Source Word Deletion for Phrase-Based Statistical Machine Translation
Chi-Ho Li | Hailei Zhang | Dongdong Zhang | Mu Li | Ming Zhou
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Diagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points
Ming Zhou | Bo Wang | Shujie Liu | Mu Li | Dongdong Zhang | Tiejun Zhao
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation
Chi-Ho Li | Minghui Li | Dongdong Zhang | Mu Li | Ming Zhou | Yi Guan
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Improving Query Spelling Correction Using Web Search Results
Qing Chen | Mu Li | Ming Zhou
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Phrase Reordering Model Integrating Syntactic Knowledge for SMT
Dongdong Zhang | Mu Li | Chi-Ho Li | Ming Zhou
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
An Improved Chinese Word Segmentation System with Conditional Random Field
Hai Zhao | Chang-Ning Huang | Mu Li
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

pdf bib
Discriminative Reranking for Spelling Correction
Yang Zhang | Pilian He | Wei Xiang | Mu Li
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

pdf bib
Effective Tag Set Selection in Chinese Word Segmentation via Conditional Random Field Modeling
Hai Zhao | Chang-Ning Huang | Mu Li | Bao-Liang Lu
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

pdf bib
Exploring Distributional Similarity Based Models for Query Spelling Correction
Mu Li | Muhua Zhu | Yang Zhang | Ming Zhou
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib
Detecting Segmentation Errors in Chinese Annotated Corpus
Chengjie Sun | Chang-Ning Huang | Xiaolong Wang | Mu Li
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing

pdf bib
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Jianfeng Gao | Mu Li | Andi Wu | Chang-Ning Huang
Computational Linguistics, Volume 31, Number 4, December 2005

2004

pdf bib
Adaptive Chinese Word Segmentation
Jianfeng Gao | Andi Wu | Mu Li | Chang-Ning Huang | Hongqiao Li | Xinsong Xia | Haowei Qin
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Improved Source-Channel Models for Chinese Word Segmentation
Jianfeng Gao | Mu Li | Chang-Ning Huang
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Unsupervised Training for Overlapping Ambiguity Resolution in Chinese Word Segmentation
Mu Li | Jianfeng Gao | Chang-Ning Huang | Jianfeng Li
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

pdf bib
Single Character Chinese Named Entity Recognition
Xiaodan Zhu | Mu Li | Jianfeng Gao | Chang-Ning Huang
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing