Degen Huang

Also published as: De-Gen Huang


Continual Learning for Multilingual Neural Machine Translation via Dual Importance-based Model Division
Junpeng Liu | Kaiyu Huang | Hao Yu | Jiuyi Li | Jinsong Su | Degen Huang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

A persistent goal of multilingual neural machine translation (MNMT) is to continually adapt the model to support new language pairs or improve some current language pairs without accessing the previous training data. To achieve this, the existing methods primarily focus on preventing catastrophic forgetting by making compromises between the original and new language pairs, leading to sub-optimal performance on both translation tasks. To mitigate this problem, we propose a dual importance-based model division method to divide the model parameters into two parts and separately model the translation of the original and new tasks. Specifically, we first remove the parameters that are negligible to the original tasks but essential to the new tasks to obtain a pruned model, which is responsible for the original translation tasks. Then we expand the pruned model with external parameters and fine-tune the newly added parameters with new training data. The whole fine-tuned model will be used for the new translation tasks. Experimental results show that our method can efficiently adapt the original model to various new translation tasks while retaining the performance of the original tasks. Further analyses demonstrate that our method consistently outperforms several strong baselines under different incremental translation scenarios.

DUTNLP System for the WMT2023 Discourse-Level Literary Translation
Anqi Zhao | Kaiyu Huang | Hao Yu | Degen Huang
Proceedings of the Eighth Conference on Machine Translation

This paper describes the submission of DUTNLP Lab submission to WMT23 Discourse-Level Literary Translation in the Chinese to English translation direction under unconstrained conditions. Our primary system aims to leverage a large language model with various prompt strategies, which can fully investigate the potential capabilities of large language models for discourse-level neural machine translation. Moreover, we test a widely used discourse-level machine translation model, G-transformer, with different training strategies. In our experimental results, the method with large language models achieves a BLEU score of 28.16, while the fine-tuned method scores 25.26. These findings indicate that selecting appropriate prompt strategies based on large language models can significantly improve translation performance compared to traditional model training methods.

Exploring Better Text Image Translation with Multimodal Codebook
Zhibin Lan | Jiawei Yu | Xiang Li | Wen Zhang | Jian Luan | Bin Wang | Degen Huang | Jinsong Su
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Text image translation (TIT) aims to translate the source texts embedded in the image to target translations, which has a wide range of applications and thus has important research value. However, current studies on TIT are confronted with two main bottlenecks: 1) this task lacks a publicly available TIT dataset, 2) dominant models are constructed in a cascaded manner, which tends to suffer from the error propagation of optical character recognition (OCR). In this work, we first annotate a Chinese-English TIT dataset named OCRMT30K, providing convenience for subsequent studies. Then, we propose a TIT model with a multimodal codebook, which is able to associate the image with relevant texts, providing useful supplementary information for translation. Moreover, we present a multi-stage training framework involving text machine translation, image-text alignment, and TIT tasks, which fully exploits additional bilingual texts, OCR dataset and our OCRMT30K dataset to train our model. Extensive experiments and in-depth analyses strongly demonstrate the effectiveness of our proposed model and training framework.

BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation
Liyan Kang | Luyang Huang | Ningxin Peng | Peihao Zhu | Zewei Sun | Shanbo Cheng | Mingxuan Wang | Degen Huang | Jinsong Su
Findings of the Association for Computational Linguistics: ACL 2023

We present a large-scale video subtitle translation dataset, *BigVideo*, to facilitate the study of multi-modality machine translation. Compared with the widely used *How2* and *VaTeX* datasets, *BigVideo* is more than 10 times larger, consisting of 4.5 million sentence pairs and 9,981 hours of videos. We also introduce two deliberately designed test sets to verify the necessity of visual information: *Ambiguous* with the presence of ambiguous words, and *Unambiguous* in which the text context is self-contained for translation. To better model the common semantics shared across texts and videos, we introduce a contrastive learning method in the cross-modal encoder. Extensive experiments on the *BigVideo* shows that: a) Visual information consistently improves the NMT model in terms of BLEU, BLEURT and COMET on both Ambiguous and Unambiguous test sets. b) Visual information helps disambiguation, compared to the strong text baseline on terminology-targeted scores and human evaluation.


Towards Robust k-Nearest-Neighbor Machine Translation
Hui Jiang | Ziyao Lu | Fandong Meng | Chulun Zhou | Jie Zhou | Degen Huang | Jinsong Su
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years. Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model. However, the underlying retrieved noisy pairs will dramatically deteriorate the model performance. In this paper, we conduct a preliminary study and find that this problem results from not fully exploiting the prediction of the NMT model. To alleviate the impact of noise, we propose a confidence-enhanced kNN-MT model with robust training. Concretely, we introduce the NMT confidence to refine the modeling of two important components of kNN-MT: kNN distribution and the interpolation weight. Meanwhile we inject two types of perturbations into the retrieved pairs for robust training. Experimental results on four benchmark datasets demonstrate that our model not only achieves significant improvements over current kNN-MT models, but also exhibits better robustness. Our code is available at

Adaptive Token-level Cross-lingual Feature Mixing for Multilingual Neural Machine Translation
Junpeng Liu | Kaiyu Huang | Jiuyi Li | Huan Liu | Jinsong Su | Degen Huang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Multilingual neural machine translation aims to translate multiple language pairs in a single model and has shown great success thanks to the knowledge transfer across languages with the shared parameters. Despite promising, this share-all paradigm suffers from insufficient ability to capture language-specific features. Currently, the common practice is to insert or search language-specific networks to balance the shared and specific features. However, those two types of features are not sufficient enough to model the complex commonality and divergence across languages, such as the locally shared features among similar languages, which leads to sub-optimal transfer, especially in massively multilingual translation. In this paper, we propose a novel token-level feature mixing method that enables the model to capture different features and dynamically determine the feature sharing across languages. Based on the observation that the tokens in the multilingual model are usually shared by different languages, we we insert a feature mixing layer into each Transformer sublayer and model each token representation as a mix of different features, with a proportion indicating its feature preference. In this way, we can perform fine-grained feature sharing and achieve better multilingual transfer. Experimental results on multilingual datasets show that our method outperforms various strong baselines and can be extended to zero-shot translation. Further analyses reveal that our method can capture different linguistic features and bridge the representation gap across languages.

DUTNLP Machine Translation System for WMT22 General MT Task
Ting Wang | Huan Liu | Junpeng Liu | Degen Huang
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes DUTNLP Lab’s submission to the WMT22 General MT Task on four translation directions: English to/from Chinese and English to/from Japanese under the constrained condition. Our primary system are built on several Transformer variants which employ wider FFN layer or deeper encoder layer. The bilingual data are filtered by detailed data pre-processing strategies and four data augmentation methods are combined to enlarge the training data with the provided monolingual data. Several common methods are also employed to further improve the model performance, such as fine-tuning, model ensemble and post-editing. As a result, our constrained systems achieve 29.01, 63.87, 41.84, and 24.82 BLEU scores on Chinese-to-English, English-to-Chinese, English-to-Japanese, and Japanese-to-English, respectively.


Enhancing Chinese Word Segmentation via Pseudo Labels for Practicability
Kaiyu Huang | Junpeng Liu | Degen Huang | Deyi Xiong | Zhuang Liu | Jinsong Su
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Lexicon-Based Graph Convolutional Network for Chinese Word Segmentation
Kaiyu Huang | Hao Yu | Junpeng Liu | Wei Liu | Jingxiang Cao | Degen Huang
Findings of the Association for Computational Linguistics: EMNLP 2021

Precise information of word boundary can alleviate the problem of lexical ambiguity to improve the performance of natural language processing (NLP) tasks. Thus, Chinese word segmentation (CWS) is a fundamental task in NLP. Due to the development of pre-trained language models (PLM), pre-trained knowledge can help neural methods solve the main problems of the CWS in significant measure. Existing methods have already achieved high performance on several benchmarks (e.g., Bakeoff-2005). However, recent outstanding studies are limited by the small-scale annotated corpus. To further improve the performance of CWS methods based on fine-tuning the PLMs, we propose a novel neural framework, LBGCN, which incorporates a lexicon-based graph convolutional network into the Transformer encoder. Experimental results on five benchmarks and four cross-domain datasets show the lexicon-based graph convolutional network successfully captures the information of candidate words and helps to improve performance on the benchmarks (Bakeoff-2005 and CTB6) and the cross-domain datasets (SIGHAN-2010). Further experiments and analyses demonstrate that our proposed framework effectively models the lexicon to enhance the ability of basic neural frameworks and strengthens the robustness in the cross-domain scenario.

Towards User-Driven Neural Machine Translation
Huan Lin | Liang Yao | Baosong Yang | Dayiheng Liu | Haibo Zhang | Weihua Luo | Degen Huang | Jinsong Su
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

A good translation should not only translate the original content semantically, but also incarnate personal traits of the original text. For a real-world neural machine translation (NMT) system, these user traits (e.g., topic preference, stylistic characteristics and expression habits) can be preserved in user behavior (e.g., historical inputs). However, current NMT systems marginally consider the user behavior due to: 1) the difficulty of modeling user portraits in zero-shot scenarios, and 2) the lack of user-behavior annotated parallel dataset. To fill this gap, we introduce a novel framework called user-driven NMT. Specifically, a cache-based module and a user-driven contrastive learning method are proposed to offer NMT the ability to capture potential user traits from their historical inputs under a zero-shot learning fashion. Furthermore, we contribute the first Chinese-English parallel corpus annotated with user behavior called UDT-Corpus. Experimental results confirm that the proposed user-driven NMT can generate user-specific translations.

Exploring Dynamic Selection of Branch Expansion Orders for Code Generation
Hui Jiang | Chulun Zhou | Fandong Meng | Biao Zhang | Jie Zhou | Degen Huang | Qingqiang Wu | Jinsong Su
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Due to the great potential in facilitating software development, code generation has attracted increasing attention recently. Generally, dominant models are Seq2Tree models, which convert the input natural language description into a sequence of tree-construction actions corresponding to the pre-order traversal of an Abstract Syntax Tree (AST). However, such a traversal order may not be suitable for handling all multi-branch nodes. In this paper, we propose to equip the Seq2Tree model with a context-based Branch Selector, which is able to dynamically determine optimal expansion orders of branches for multi-branch nodes. Particularly, since the selection of expansion orders is a non-differentiable multi-step operation, we optimize the selector through reinforcement learning, and formulate the reward function as the difference of model losses obtained through different expansion orders. Experimental results and in-depth analysis on several commonly-used datasets demonstrate the effectiveness and generality of our approach. We have released our code at

Improving Graph-based Sentence Ordering with Iteratively Predicted Pairwise Orderings
Shaopeng Lai | Ante Wang | Fandong Meng | Jie Zhou | Yubin Ge | Jiali Zeng | Junfeng Yao | Degen Huang | Jinsong Su
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Dominant sentence ordering models can be classified into pairwise ordering models and set-to-sequence models. However, there is little attempt to combine these two types of models, which inituitively possess complementary advantages. In this paper, we propose a novel sentence ordering framework which introduces two classifiers to make better use of pairwise orderings for graph-based sentence ordering (Yin et al. 2019, 2021). Specially, given an initial sentence-entity graph, we first introduce a graph-based classifier to predict pairwise orderings between linked sentences. Then, in an iterative manner, based on the graph updated by previously predicted high-confident pairwise orderings, another classifier is used to predict the remaining uncertain pairwise orderings. At last, we adapt a GRN-based sentence ordering model (Yin et al. 2019, 2021) on the basis of final graph. Experiments on five commonly-used datasets demonstrate the effectiveness and generality of our model. Particularly, when equipped with BERT (Devlin et al. 2019) and FHDecoder (Yin et al. 2020), our model achieves state-of-the-art performance. Our code is available at

DUTNLP Machine Translation System for WMT21 Triangular Translation Task
Huan Liu | Junpeng Liu | Kaiyu Huang | Degen Huang
Proceedings of the Sixth Conference on Machine Translation

This paper describes DUT-NLP Lab’s submission to the WMT-21 triangular machine translation shared task. The participants are not allowed to use other data and the translation direction of this task is Russian-to-Chinese. In this task, we use the Transformer as our baseline model, and integrate several techniques to enhance the performance of the baseline, including data filtering, data selection, fine-tuning, and post-editing. Further, to make use of the English resources, such as Russian/English and Chinese/English parallel data, the relationship triangle is constructed by multilingual neural machine translation systems. As a result, our submission achieves a BLEU score of 21.9 in Russian-to-Chinese.


A Joint Multiple Criteria Model in Transfer Learning for Cross-domain Chinese Word Segmentation
Kaiyu Huang | Degen Huang | Zhuang Liu | Fengran Mo
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Word-level information is important in natural language processing (NLP), especially for the Chinese language due to its high linguistic complexity. Chinese word segmentation (CWS) is an essential task for Chinese downstream NLP tasks. Existing methods have already achieved a competitive performance for CWS on large-scale annotated corpora. However, the accuracy of the method will drop dramatically when it handles an unsegmented text with lots of out-of-vocabulary (OOV) words. In addition, there are many different segmentation criteria for addressing different requirements of downstream NLP tasks. Excessive amounts of models with saving different criteria will generate the explosive growth of the total parameters. To this end, we propose a joint multiple criteria model that shares all parameters to integrate different segmentation criteria into one model. Besides, we utilize a transfer learning method to improve the performance of OOV words. Our proposed method is evaluated by designing comprehensive experiments on multiple benchmark datasets (e.g., Bakeoff 2005, Bakeoff 2008 and SIGHAN 2010). Our method achieves the state-of-the-art performances on all datasets. Importantly, our method also shows a competitive practicability and generalization ability for the CWS task.

Context-Aware Word Segmentation for Chinese Real-World Discourse
Kaiyu Huang | Junpeng Liu | Jingxiang Cao | Degen Huang
Proceedings of the Second International Workshop of Discourse Processing

Previous neural approaches achieve significant progress for Chinese word segmentation (CWS) as a sentence-level task, but it suffers from limitations on real-world scenario. In this paper, we address this issue with a context-aware method and optimize the solution at document-level. This paper proposes a three-step strategy to improve the performance for discourse CWS. First, the method utilizes an auxiliary segmenter to remedy the limitation on pre-segmenter. Then the context-aware algorithm computes the confidence of each split. The maximum probability path is reconstructed via this algorithm. Besides, in order to evaluate the performance in discourse, we build a new benchmark consisting of the latest news and Chinese medical articles. Extensive experiments on this benchmark show that our proposed method achieves a competitive performance on a document-level real-world scenario for CWS.


Research on attention memory networks as a model for learning natural language inference
Zhuang Liu | Degen Huang | Jing Zhang | Kaiyu Huang
Proceedings of the Workshop on Structured Prediction for NLP


Learning Bilingual Sentiment Word Embeddings for Cross-language Sentiment Classification
HuiWei Zhou | Long Chen | Fulin Shi | Degen Huang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)


Improving Feature-Based Biomedical Event Extraction System by Integrating Argument Information
Lishuang Li | Yiwen Wang | Degen Huang
Proceedings of the BioNLP Shared Task 2013 Workshop


Rules-based Chinese Word Segmentation on MicroBlog for CIPS-SIGHAN on CLP2012
Jing Zhang | Degen Huang | Xia Han | Wei Wang
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing


Combining Syntactic and Semantic Features by SVM for Unrestricted Coreference Resolution
Huiwei Zhou | Yao Li | Degen Huang | Yan Zhang | Chunlong Wu | Yuansheng Yang
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

POS Tagging of English Particles for Machine Translation
Jianjun Ma | Degen Huang | Haixia Liu | Wenfeng Sheng
Proceedings of Machine Translation Summit XIII: Papers


Exploiting Multi-Features to Detect Hedges and their Scope in Biomedical Texts
Huiwei Zhou | Xiaoyan Li | Degen Huang | Zezhong Li | Yuansheng Yang
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

HMM Revises Low Marginal Probability by CRF for Chinese Word Segmentation
Degen Huang | Deqin Tong | Yanyan Luo
CIPS-SIGHAN Joint Conference on Chinese Language Processing

DLUT: Chinese Personal Name Disambiguation with Rich Features
Dongliang Wang | Degen Huang
CIPS-SIGHAN Joint Conference on Chinese Language Processing

Mining Large-scale Comparable Corpora from Chinese-English News Collections
Degen Huang | Lian Zhao | Lishuang Li | Haitao Yu
Coling 2010: Posters


HMM and CRF Based Hybrid Model for Chinese Lexical Analysis
Degen Huang | Xiao Sun | Shidou Jiao | Lishuang Li | Zhuoye Ding | Ru Wan
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing


Hybrid Models for Chinese Named Entity Recognition
Lishuang Li | Tingting Mao | Degen Huang | Yuansheng Yang
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing


Chinese Main Verb Identification: From Specification to Realization
Bing-Gong Ding | Chang-Ning Huang | De-Gen Huang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 1, March 2005