Jingbo Zhu

Also published as: JingBo Zhu


2022

pdf bib
The NiuTrans’s Submission to the IWSLT22 English-to-Chinese Offline Speech Translation Task
Yuhao Zhang | Canan Huang | Chen Xu | Xiaoqian Liu | Bei Li | Anxiang Ma | Tong Xiao | Jingbo Zhu
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper describes NiuTrans’s submission to the IWSLT22 English-to-Chinese (En-Zh) offline speech translation task. The end-to-end and bilingual system is built by constrained English and Chinese data and translates the English speech to Chinese text without intermediate transcription. Our speech translation models are composed of different pre-trained acoustic models and machine translation models by two kinds of adapters. We compared the effect of the standard speech feature (e.g. log Mel-filterbank) and the pre-training speech feature and try to make them interact. The final submission is an ensemble of three potential speech translation models. Our single best and ensemble model achieves 18.66 BLEU and 19.35 BLEU separately on MuST-C En-Zh tst-COMMON set.

pdf bib
On Vision Features in Multimodal Machine Translation
Bei Li | Chuanhao Lv | Zefan Zhou | Tao Zhou | Tong Xiao | Anxiang Ma | JingBo Zhu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Previous work on multimodal machine translation (MMT) has focused on the way of incorporating vision features into translation but little attention is on the quality of vision models. In this work, we investigate the impact of vision models on MMT. Given the fact that Transformer is becoming popular in computer vision, we experiment with various strong models (such as Vision Transformer) and enhanced features (such as object-detection and image captioning). We develop a selective attention model to study the patch-level contribution of an image in MMT. On detailed probing tasks, we find that stronger vision models are helpful for learning translation from the visual modality. Our results also suggest the need of carefully examining MMT models, especially when current benchmarks are small-scale and biased.

pdf bib
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
Bei Li | Quan Du | Tao Zhou | Yi Jing | Shuhan Zhou | Xin Zeng | Tong Xiao | JingBo Zhu | Xuebo Liu | Min Zhang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODE). This paper explores a deeper relationship between Transformer and numerical ODE methods. We first show that a residual block of layers in Transformer can be described as a higher-order solution to ODE. Inspired by this, we design a new architecture, ODE Transformer, which is analogous to the Runge-Kutta method that is well motivated in ODE. As a natural extension to Transformer, ODE Transformer is easy to implement and efficient to use. Experimental results on the large-scale machine translation, abstractive summarization, and grammar error correction tasks demonstrate the high genericity of ODE Transformer. It can gain large improvements in model performance over strong baselines (e.g., 30.77 and 44.11 BLEU scores on the WMT’14 English-German and English-French benchmarks) at a slight cost in inference efficiency.

2021

pdf bib
RankNAS: Efficient Neural Architecture Search by Pairwise Ranking
Chi Hu | Chenglong Wang | Xiangnan Ma | Xia Meng | Yinqiao Li | Tong Xiao | Jingbo Zhu | Changliang Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This paper addresses the efficiency challenge of Neural Architecture Search (NAS) by formulating the task as a ranking problem. Previous methods require numerous training examples to estimate the accurate performance of architectures, although the actual goal is to find the distinction between “good” and “bad” candidates. Here we do not resort to performance predictors. Instead, we propose a performance ranking method (RankNAS) via pairwise ranking. It enables efficient architecture search using much fewer training examples. Moreover, we develop an architecture selection method to prune the search space and concentrate on more promising candidates. Extensive experiments on machine translation and language modeling tasks show that RankNAS can design high-performance architectures while being orders of magnitude faster than state-of-the-art NAS systems.

pdf bib
The NiuTrans Machine Translation Systems for WMT21
Shuhan Zhou | Tao Zhou | Binghao Wei | Yingfeng Luo | Yongyu Mu | Zefan Zhou | Chenglong Wang | Xuanjun Zhou | Chuanhao Lv | Yi Jing | Laohu Wang | Jingnan Zhang | Canan Huang | Zhongxiang Yan | Chi Hu | Bei Li | Tong Xiao | Jingbo Zhu
Proceedings of the Sixth Conference on Machine Translation

This paper describes NiuTrans neural machine translation systems of the WMT 2021 news translation tasks. We made submissions to 9 language directions, including English2Chinese, Japanese, Russian, Icelandic and English2Hausa tasks. Our primary systems are built on several effective variants of Transformer, e.g., Transformer-DLCL, ODE-Transformer. We also utilize back-translation, knowledge distillation, post-ensemble, and iterative fine-tuning techniques to enhance the model performance further.

pdf bib
The NiuTrans System for the WMT 2021 Efficiency Task
Chenglong Wang | Chi Hu | Yongyu Mu | Zhongxiang Yan | Siming Wu | Yimin Hu | Hang Cao | Bei Li | Ye Lin | Tong Xiao | Jingbo Zhu
Proceedings of the Sixth Conference on Machine Translation

This paper describes the NiuTrans system for the WMT21 translation efficiency task. Following last year’s work, we explore various techniques to improve the efficiency while maintaining translation quality. We investigate the combinations of lightweight Transformer architectures and knowledge distillation strategies. Also, we improve the translation efficiency with graph optimization, low precision, dynamic batching, and parallel pre/post-processing. Putting these together, our system can translate 247,000 words per second on an NVIDIA A100, being 3× faster than our last year’s system. Our system is the fastest and has the lowest memory consumption on the GPU-throughput track. The code, model, and pipeline will be available at NiuTrans.NMT.

pdf bib
利用图像描述与知识图谱增强表示的视觉问答(Exploiting Image Captions and External Knowledge as Representation Enhancement for Visual Question Answering)
Gechao Wang (王屹超) | Muhua Zhu (朱慕华) | Chen Xu (许晨) | Yan Zhang (张琰) | Huizhen Wang (王会珍) | Jingbo Zhu (朱靖波)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

视觉问答作为多模态任务,需要深度理解图像和文本问题从而推理出答案。然而在许多情况下,仅在图像和问题上进行简单推理难以得到正确的答案,事实上还有其它有效的信息可以被利用,例如图像描述、外部知识等。针对以上问题,本文提出了利用图像描述和外部知识增强表示的视觉问答模型。该模型以问题为导向,基于协同注意力机制分别在图像和其描述上进行编码,并且利用知识图谱嵌入,将外部知识编码到模型当中,丰富了模型的特征表示,增强模型的推理能力。在OKVQA数据集上的实验结果表明本文方法相比基线系统有1.71%的准确率提升,与先前工作中的主流模型相比也有1.88%的准确率提升,证明了本文方法的有效性。

pdf bib
Bag of Tricks for Optimizing Transformer Efficiency
Ye Lin | Yanyang Li | Tong Xiao | Jingbo Zhu
Findings of the Association for Computational Linguistics: EMNLP 2021

Improving Transformer efficiency has become increasingly attractive recently. A wide range of methods has been proposed, e.g., pruning, quantization, new architectures and etc. But these methods are either sophisticated in implementation or dependent on hardware. In this paper, we show that the efficiency of Transformer can be improved by combining some simple and hardware-agnostic methods, including tuning hyper-parameters, better design choices and training strategies. On the WMT news translation tasks, we improve the inference efficiency of a strong Transformer system by 3.80x on CPU and 2.52x on GPU.

pdf bib
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
Chen Xu | Xiaoqian Liu | Xiaowen Liu | Tiger Wang | Canan Huang | Tong Xiao | Jingbo Zhu
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task, which translates from the English audio to German text directly without intermediate transcription. We use the Transformer-based model architecture and enhance it by Conformer, relative position encoding, and stacked acoustic and textual encoding. To augment the training data, the English transcriptions are translated to German translations. Finally, we employ ensemble decoding to integrate the predictions from several models trained with the different datasets. Combining these techniques, we achieve 33.84 BLEU points on the MuST-C En-De test set, which shows the enormous potential of the end-to-end model.

pdf bib
Weight Distillation: Transferring the Knowledge in Neural Network Parameters
Ye Lin | Yanyang Li | Ziyang Wang | Bei Li | Quan Du | Tong Xiao | Jingbo Zhu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Knowledge distillation has been proven to be effective in model acceleration and compression. It transfers knowledge from a large neural network to a small one by using the large neural network predictions as targets of the small neural network. But this way ignores the knowledge inside the large neural networks, e.g., parameters. Our preliminary study as well as the recent success in pre-training suggests that transferring parameters are more effective in distilling knowledge. In this paper, we propose Weight Distillation to transfer the knowledge in parameters of a large neural network to a small neural network through a parameter generator. On the WMT16 En-Ro, NIST12 Zh-En, and WMT14 En-De machine translation tasks, our experiments show that weight distillation learns a small network that is 1.88 2.94x faster than the large network but with competitive BLEU performance. When fixing the size of small networks, weight distillation outperforms knowledge distillation by 0.51 1.82 BLEU points.

pdf bib
Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders
Chen Xu | Bojie Hu | Yanyang Li | Yuhao Zhang | Shen Huang | Qi Ju | Tong Xiao | Jingbo Zhu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Encoder pre-training is promising in end-to-end Speech Translation (ST), given the fact that speech-to-translation data is scarce. But ST encoders are not simple instances of Automatic Speech Recognition (ASR) or Machine Translation (MT) encoders. For example, we find that ASR encoders lack the global context representation, which is necessary for translation, whereas MT encoders are not designed to deal with long but locally attentive acoustic sequences. In this work, we propose a Stacked Acoustic-and-Textual Encoding (SATE) method for speech translation. Our encoder begins with processing the acoustic sequence as usual, but later behaves more like an MT encoder for a global representation of the input sequence. In this way, it is straightforward to incorporate the pre-trained models into the system. Also, we develop an adaptor module to alleviate the representation inconsistency between the pre-trained ASR encoder and MT encoder, and develop a multi-teacher knowledge distillation method to preserve the pre-training knowledge. Experimental results on the LibriSpeech En-Fr and MuST-C En-De ST tasks show that our method achieves state-of-the-art BLEU scores of 18.3 and 25.2. To our knowledge, we are the first to develop an end-to-end ST system that achieves comparable or even better BLEU performance than the cascaded ST counterpart when large-scale ASR and MT data is available.

2020

pdf bib
Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation
Bei Li | Hui Liu | Ziyang Wang | Yufan Jiang | Tong Xiao | Jingbo Zhu | Tongran Liu | Changliang Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In encoder-decoder neural models, multiple encoders are in general used to represent the contextual information in addition to the individual sentence. In this paper, we investigate multi-encoder approaches in document-level neural machine translation (NMT). Surprisingly, we find that the context encoder does not only encode the surrounding sentences but also behaves as a noise generator. This makes us rethink the real benefits of multi-encoder in context-aware translation - some of the improvements come from robust training. We compare several methods that introduce noise and/or well-tuned dropout setup into the training of these encoders. Experimental results show that noisy training plays an important role in multi-encoder-based NMT, especially when the training data is small. Also, we establish a new state-of-the-art on IWSLT Fr-En task by careful use of noise generation and dropout methods.

pdf bib
Learning Architectures from an Extended Search Space for Language Modeling
Yinqiao Li | Chi Hu | Yuhao Zhang | Nuo Xu | Yufan Jiang | Tong Xiao | Jingbo Zhu | Tongran Liu | Changliang Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell. In this paper, we extend the search space of NAS. In particular, we present a general approach to learn both intra-cell and inter-cell architectures (call it ESS). For a better search result, we design a joint learning method to perform intra-cell and inter-cell NAS simultaneously. We implement our model in a differentiable architecture search system. For recurrent neural language modeling, it outperforms a strong baseline significantly on the PTB and WikiText data, with a new state-of-the-art on PTB. Moreover, the learned architectures show good transferability to other systems. E.g., they improve state-of-the-art systems on the CoNLL and WNUT named entity recognition (NER) tasks and CoNLL chunking task, indicating a promising line of research on large-scale pre-learned architectures.

pdf bib
The NiuTrans System for WNGT 2020 Efficiency Task
Chi Hu | Bei Li | Yinqiao Li | Ye Lin | Yanyang Li | Chenglong Wang | Tong Xiao | Jingbo Zhu
Proceedings of the Fourth Workshop on Neural Generation and Translation

This paper describes the submissions of the NiuTrans Team to the WNGT 2020 Efficiency Shared Task. We focus on the efficient implementation of deep Transformer models (Wang et al., 2019; Li et al., 2019) using NiuTensor, a flexible toolkit for NLP tasks. We explored the combination of deep encoder and shallow decoder in Transformer models via model compression and knowledge distillation. The neural machine translation decoding also benefits from FP16 inference, attention caching, dynamic batching, and batch pruning. Our systems achieve promising results in both translation quality and efficiency, e.g., our fastest system can translate more than 40,000 tokens per second with an RTX 2080 Ti while maintaining 42.9 BLEU on newstest2018.

pdf bib
Dynamic Curriculum Learning for Low-Resource Neural Machine Translation
Chen Xu | Bojie Hu | Yufan Jiang | Kai Feng | Zeyang Wang | Shen Huang | Qi Ju | Tong Xiao | Jingbo Zhu
Proceedings of the 28th International Conference on Computational Linguistics

Large amounts of data has made neural machine translation (NMT) a big success in recent years. But it is still a challenge if we train these models on small-scale corpora. In this case, the way of using data appears to be more important. Here, we investigate the effective use of training data for low-resource NMT. In particular, we propose a dynamic curriculum learning (DCL) method to reorder training samples in training. Unlike previous work, we do not use a static scoring function for reordering. Instead, the order of training samples is dynamically determined in two ways - loss decline and model competence. This eases training by highlighting easy samples that the current model has enough competence to learn. We test our DCL method in a Transformer-based system. Experimental results show that DCL outperforms several strong baselines on three low-resource machine translation benchmarks and different sized data of WMT’16 En-De.

pdf bib
Layer-Wise Multi-View Learning for Neural Machine Translation
Qiang Wang | Changliang Li | Yue Zhang | Tong Xiao | Jingbo Zhu
Proceedings of the 28th International Conference on Computational Linguistics

Traditional neural machine translation is limited to the topmost encoder layer’s context representation and cannot directly perceive the lower encoder layers. Existing solutions usually rely on the adjustment of network architecture, making the calculation more complicated or introducing additional structural restrictions. In this work, we propose layer-wise multi-view learning to solve this problem, circumventing the necessity to change the model structure. We regard each encoder layer’s off-the-shelf output, a by-product in layer-by-layer encoding, as the redundant view for the input sentence. In this way, in addition to the topmost encoder layer (referred to as the primary view), we also incorporate an intermediate encoder layer as the auxiliary view. We feed the two views to a partially shared decoder to maintain independent prediction. Consistency regularization based on KL divergence is used to encourage the two views to learn from each other. Extensive experimental results on five translation tasks show that our approach yields stable improvements over multiple strong baselines. As another bonus, our method is agnostic to network architectures and can maintain the same inference speed as the original model.

pdf bib
A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction
Yanyang Li | Yingfeng Luo | Ye Lin | Quan Du | Huizhen Wang | Shujian Huang | Tong Xiao | Jingbo Zhu
Proceedings of the 28th International Conference on Computational Linguistics

Unsupervised Bilingual Dictionary Induction methods based on the initialization and the self-learning have achieved great success in similar language pairs, e.g., English-Spanish. But they still fail and have an accuracy of 0% in many distant language pairs, e.g., English-Japanese. In this work, we show that this failure results from the gap between the actual initialization performance and the minimum initialization performance for the self-learning to succeed. We propose Iterative Dimension Reduction to bridge this gap. Our experiments show that this simple method does not hamper the performance of similar language pairs and achieves an accuracy of 13.64 55.53% between English and four distant languages, i.e., Chinese, Japanese, Vietnamese and Thai.

pdf bib
Training Flexible Depth Model by Multi-Task Learning for Neural Machine Translation
Qiang Wang | Tong Xiao | Jingbo Zhu
Findings of the Association for Computational Linguistics: EMNLP 2020

The standard neural machine translation model can only decode with the same depth configuration as training. Restricted by this feature, we have to deploy models of various sizes to maintain the same translation latency, because the hardware conditions on different terminal devices (e.g., mobile phones) may vary greatly. Such individual training leads to increased model maintenance costs and slower model iterations, especially for the industry. In this work, we propose to use multi-task learning to train a flexible depth model that can adapt to different depth configurations during inference. Experimental results show that our approach can simultaneously support decoding in 24 depth configurations and is superior to the individual training and another flexible depth model training method——LayerDrop.

pdf bib
Shallow-to-Deep Training for Neural Machine Translation
Bei Li | Ziyang Wang | Hui Liu | Yufan Jiang | Quan Du | Tong Xiao | Huizhen Wang | Jingbo Zhu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why deep models help NMT is an open question. In this paper, we investigate the behavior of a well-tuned deep Transformer system. We find that stacking layers is helpful in improving the representation ability of NMT models and adjacent layers perform similarly. This inspires us to develop a shallow-to-deep training method that learns deep models by stacking shallow models. In this way, we successfully train a Transformer system with a 54-layer encoder. Experimental results on WMT’16 English-German and WMT’14 English-French translation tasks show that it is 1:4 faster than training from scratch, and achieves a BLEU score of 30:33 and 43:29 on two tasks. The code is publicly available at https://github.com/libeineu/SDT-Training.

pdf bib
The NiuTrans Machine Translation Systems for WMT20
Yuhao Zhang | Ziyang Wang | Runzhe Cao | Binghao Wei | Weiqiao Shan | Shuhan Zhou | Abudurexiti Reheman | Tao Zhou | Xin Zeng | Laohu Wang | Yongyu Mu | Jingnan Zhang | Xiaoqian Liu | Xuanjun Zhou | Yinqiao Li | Bei Li | Tong Xiao | Jingbo Zhu
Proceedings of the Fifth Conference on Machine Translation

This paper describes NiuTrans neural machine translation systems of the WMT20 news translation tasks. We participated in Japanese<->English, English->Chinese, Inuktitut->English and Tamil->English total five tasks and rank first in Japanese<->English both sides. We mainly utilized iterative back-translation, different depth and widen model architectures, iterative knowledge distillation and iterative fine-tuning. And we find that adequately widened and deepened the model simultaneously, the performance will significantly improve. Also, iterative fine-tuning strategy we implemented is effective during adapting domain. For Inuktitut->English and Tamil->English tasks, we built multilingual models separately and employed pretraining word embedding to obtain better performance.

pdf bib
The NiuTrans System for the WMT20 Quality Estimation Shared Task
Chi Hu | Hui Liu | Kai Feng | Chen Xu | Nuo Xu | Zefan Zhou | Shiqin Yan | Yingfeng Luo | Chenglong Wang | Xia Meng | Tong Xiao | Jingbo Zhu
Proceedings of the Fifth Conference on Machine Translation

This paper describes the submissions of the NiuTrans Team to the WMT 2020 Quality Estimation Shared Task. We participated in all tasks and all language pairs. We explored the combination of transfer learning, multi-task learning and model ensemble. Results on multiple tasks show that deep transformer machine translation models and multilingual pretraining methods significantly improve translation quality estimation performance. Our system achieved remarkable results in multiple level tasks, e.g., our submissions obtained the best results on all tracks in the sentence-level Direct Assessment task.

2019

pdf bib
Improved Differentiable Architecture Search for Language Modeling and Named Entity Recognition
Yufan Jiang | Chi Hu | Tong Xiao | Chunliang Zhang | Jingbo Zhu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this paper, we study differentiable neural architecture search (NAS) methods for natural language processing. In particular, we improve differentiable architecture search by removing the softmax-local constraint. Also, we apply differentiable NAS to named entity recognition (NER). It is the first time that differentiable NAS methods are adopted in NLP tasks other than language modeling. On both the PTB language modeling and CoNLL-2003 English NER data, our method outperforms strong baselines. It achieves a new state-of-the-art on the NER task.

pdf bib
The NiuTrans Machine Translation Systems for WMT19
Bei Li | Yinqiao Li | Chen Xu | Ye Lin | Jiqiang Liu | Hui Liu | Ziyang Wang | Yuhao Zhang | Nuo Xu | Zeyang Wang | Kai Feng | Hexuan Chen | Tengbo Liu | Yanyang Li | Qiang Wang | Tong Xiao | Jingbo Zhu
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper described NiuTrans neural machine translation systems for the WMT 2019 news translation tasks. We participated in 13 translation directions, including 11 supervised tasks, namely EN↔{ZH, DE, RU, KK, LT}, GU→EN and the unsupervised DE↔CS sub-track. Our systems were built on Deep Transformer and several back-translation methods. Iterative knowledge distillation and ensemble+reranking were also employed to obtain stronger models. Our unsupervised submissions were based on NMT enhanced by SMT. As a result, we achieved the highest BLEU scores in {KK↔EN, GU→EN} directions, ranking 2nd in {RU→EN, DE↔CS} and 3rd in {ZH→EN, LT→EN, EN→RU, EN↔DE} among all constrained submissions.

pdf bib
Learning Deep Transformer Models for Machine Translation
Qiang Wang | Bei Li | Tong Xiao | Jingbo Zhu | Changliang Li | Derek F. Wong | Lidia S. Chao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT’16 English-German and NIST OpenMT’12 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4-2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big.

pdf bib
Shared-Private Bilingual Word Embeddings for Neural Machine Translation
Xuebo Liu | Derek F. Wong | Yang Liu | Lidia S. Chao | Tong Xiao | Jingbo Zhu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Word embedding is central to neural machine translation (NMT), which has attracted intensive research interest in recent years. In NMT, the source embedding plays the role of the entrance while the target embedding acts as the terminal. These layers occupy most of the model parameters for representation learning. Furthermore, they indirectly interface via a soft-attention mechanism, which makes them comparatively isolated. In this paper, we propose shared-private bilingual word embeddings, which give a closer relationship between the source and target embeddings, and which also reduce the number of model parameters. For similar source and target words, their embeddings tend to share a part of the features and they cooperatively learn these common representation units. Experiments on 5 language pairs belonging to 6 different language families and written in 5 different alphabets demonstrate that the proposed model provides a significant performance boost over the strong baselines with dramatically fewer model parameters.

2018

pdf bib
The NiuTrans Machine Translation System for WMT18
Qiang Wang | Bei Li | Jiqiang Liu | Bojian Jiang | Zheyang Zhang | Yinqiao Li | Ye Lin | Tong Xiao | Jingbo Zhu
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the submission of the NiuTrans neural machine translation system for the WMT 2018 Chinese ↔ English news translation tasks. Our baseline systems are based on the Transformer architecture. We further improve the translation performance 2.4-2.6 BLEU points from four aspects, including architectural improvements, diverse ensemble decoding, reranking, and post-processing. Among constrained submissions, we rank 2nd out of 16 submitted systems on Chinese → English task and 3rd out of 16 on English → Chinese task, respectively.

pdf bib
A Simple and Effective Approach to Coverage-Aware Neural Machine Translation
Yanyang Li | Tong Xiao | Yinqiao Li | Qiang Wang | Changming Xu | Jingbo Zhu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We offer a simple and effective method to seek a better balance between model confidence and length preference for Neural Machine Translation (NMT). Unlike the popular length normalization and coverage models, our model does not require training nor reranking the limited n-best outputs. Moreover, it is robust to large beam sizes, which is not well studied in previous work. On the Chinese-English and English-German translation tasks, our approach yields +0.4 1.5 BLEU improvements over the state-of-the-art baselines.

pdf bib
Detecting Free Translation in Parallel Corpora from Attention Scores
Qi Chen | Oi Yee Kwong | Jingbo Zhu
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
Multi-layer Representation Fusion for Neural Machine Translation
Qiang Wang | Fuxue Li | Tong Xiao | Yanyang Li | Yinqiao Li | Jingbo Zhu
Proceedings of the 27th International Conference on Computational Linguistics

Neural machine translation systems require a number of stacked layers for deep models. But the prediction depends on the sentence representation of the top-most layer with no access to low-level representations. This makes it more difficult to train the model and poses a risk of information loss to prediction. In this paper, we propose a multi-layer representation fusion (MLRF) approach to fusing stacked layers. In particular, we design three fusion functions to learn a better representation from the stack. Experimental results show that our approach yields improvements of 0.92 and 0.56 BLEU points over the strong Transformer baseline on IWSLT German-English and NIST Chinese-English MT tasks respectively. The result is new state-of-the-art in German-English translation.

2017

pdf bib
Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation
Baosong Yang | Derek F. Wong | Tong Xiao | Lidia S. Chao | Jingbo Zhu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder. To maximize the predictive likelihood of target words, a weighted variant of an attention mechanism is used to balance the attentive information between lexical and phrase vectors. Using a tree-based rare word encoding, the proposed model is extended to sub-word level to alleviate the out-of-vocabulary (OOV) problem. Empirical results reveal that the proposed model significantly outperforms sequence-to-sequence attention-based and tree-based neural translation models in English-Chinese translation tasks.

2015

pdf bib
NiuParser: A Chinese Syntactic and Semantic Parsing Toolkit
Jingbo Zhu | Muhua Zhu | Qiang Wang | Tong Xiao
Proceedings of ACL-IJCNLP 2015 System Demonstrations

2014

pdf bib
Syntactic SMT Using a Discriminative Text Generation Model
Yue Zhang | Kai Song | Linfeng Song | Jingbo Zhu | Qun Liu
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Effective Incorporation of Source Syntax into Hierarchical Phrase-based Translation
Tong Xiao | Adrià de Gispert | Jingbo Zhu | Bill Byrne
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Tagging The Web: Building A Robust Web Tagger with Neural Network
Ji Ma | Yue Zhang | Jingbo Zhu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Hybrid Approach to Skeleton-based Translation
Tong Xiao | Jingbo Zhu | Chunliang Zhang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Punctuation Processing for Projective Dependency Parsing
Ji Ma | Yue Zhang | Jingbo Zhu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Kalina Bontcheva | Jingbo Zhu
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2013

pdf bib
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing
Liang-Chih Yu | Yuen-Hsien Tseng | Jingbo Zhu | Fuji Ren
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing

pdf bib
Fast and Accurate Shift-Reduce Constituent Parsing
Muhua Zhu | Yue Zhang | Wenliang Chen | Min Zhang | Jingbo Zhu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Easy-First POS Tagging and Dependency Parsing with Beam Search
Ji Ma | Jingbo Zhu | Tong Xiao | Nan Yang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Learning Better Rule Extraction with Translation Span Alignment
Jingbo Zhu | Tong Xiao | Chunliang Zhang
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
Tong Xiao | Jingbo Zhu | Hao Zhang | Qiang Li
Proceedings of the ACL 2012 System Demonstrations

pdf bib
Easy-First Chinese POS Tagging and Dependency Parsing
Ji Ma | Tong Xiao | Jingbo Zhu | Feiliang Ren
Proceedings of COLING 2012

pdf bib
Exploiting Lexical Dependencies from Large-Scale Data for Better Shift-Reduce Constituency Parsing
Muhua Zhu | Jingbo Zhu | Huizhen Wang
Proceedings of COLING 2012

pdf bib
NEU Systems in SIGHAN Bakeoff 2012
Ji Ma | LongFei Bai | Zhuo Liu | Ao Zhang | Jingbo Zhu
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

2011

pdf bib
Document-level Consistency Verification in Machine Translation
Tong Xiao | Jingbo Zhu | Shujie Yao | Hao Zhang
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Improving Decoding Generalization for Tree-to-String Translation
Jingbo Zhu | Tong Xiao
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Better Automatic Treebank Conversion Using A Feature-Based Approach
Muhua Zhu | Jingbo Zhu | Minghan Hu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Boosting-Based System Combination for Machine Translation
Tong Xiao | Jingbo Zhu | Muhua Zhu | Huizhen Wang
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Heterogeneous Parsing via Collaborative Decoding
Muhua Zhu | Jingbo Zhu | Tong Xiao
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
An Empirical Study of Translation Rule Extraction with Multiple Parsers
Tong Xiao | Jingbo Zhu | Hao Zhang | Muhua Zhu
Coling 2010: Posters

pdf bib
Automatic Treebank Conversion via Informed Decoding
Muhua Zhu | Jingbo Zhu
Coling 2010: Posters

pdf bib
Mining Large-scale Parallel Corpora from Multilingual Patents: An English-Chinese example and its application to SMT
Bin Lu | Benjamin K. Tsou | Tao Jiang | Oi Yee Kwong | Jingbo Zhu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
High OOV-Recall Chinese Word Segmenter
Xiaoming Xu | Muhua Zhu | Xiaoxu Fei | Jingbo Zhu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Chinese Syntactic Parsing Evaluation
Qiang Zhou | Jingbo Zhu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
A Multi-stage Clustering Framework for Chinese Personal Name Disambiguation
Huizhen Wang | Haibo Ding | Yingchao Shi | Ji Ma | Xiao Zhou | Jingbo Zhu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
NEUNLPLab Chinese Word Sense Induction System for SIGHAN Bakeoff 2010
Hao Zhang | Tong Xiao | Jingbo Zhu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

2009

pdf bib
Chinese-English Organization Name Translation Based on Correlative Expansion
Feiliang Ren | Muhua Zhu | Huizhen Wang | Jingbo Zhu
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
The Construction of a Chinese-English Patent Parallel Corpus
Bin Lu | Benjamin K. Tsou | Jingbo Zhu | Tao Jiang | Oi Yee Kwong
Proceedings of the Third Workshop on Patent Translation

pdf bib
Better Synchronous Binarization for Machine Translation
Tong Xiao | Mu Li | Dongdong Zhang | Jingbo Zhu | Ming Zhou
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Multi-Criteria-Based Strategy to Stop Active Learning for Data Annotation
Jingbo Zhu | Huizhen Wang | Eduard Hovy
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification
Jingbo Zhu | Huizhen Wang | Tianshun Yao | Benjamin K Tsou
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Learning a Stopping Criterion for Active Learning for Word Sense Disambiguation and Text Classification
Jingbo Zhu | Huizhen Wang | Eduard Hovy
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Towards Automated Semantic Analysis on Biomedical Research Articles
Donghui Feng | Gully Burns | Jingbo Zhu | Eduard Hovy
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
An Effective Hybrid Machine Learning Approach for Coreference Resolution
Feiliang Ren | Jingbo Zhu
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

pdf bib
Which Performs Better on In-Vocabulary Word Segmentation: Based on Word or Character?
Zhenxing Wang | Changning Huang | Jingbo Zhu
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

pdf bib
The Character-based CRF Segmenter of MSRA&NEU for the 4th Bakeoff
Zhenxing Wang | Changning Huang | Jingbo Zhu
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

2007

pdf bib
Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem
Jingbo Zhu | Eduard Hovy
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Designing Special Post-Processing Rules for SVM-Based Chinese Word Segmentation
Muhua Zhu | Yilin Wang | Zhenxing Wang | Huizhen Wang | Jingbo Zhu
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

2005

pdf bib
Using Multiple Discriminant Analysis Approach for Linear Text Segmentation
Jingbo Zhu | Na Ye | Xinzhi Chang | Wenliang Chen | Benjamin K Tsou
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Some Studies on Chinese Domain Knowledge Dictionary and Its Application to Text Classification
Jingbo Zhu | Wenliang Chen
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing

2002

pdf bib
A Knowledge-based Approach to Text Classification
Jingbo Zhu | Tianshun Yao
COLING-02: The First SIGHAN Workshop on Chinese Language Processing

Search