Degen Huang

Also published as: De-Gen Huang


2021

pdf bib
Enhancing Chinese Word Segmentation via Pseudo Labels for Practicability
Kaiyu Huang | Junpeng Liu | Degen Huang | Deyi Xiong | Zhuang Liu | Jinsong Su
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Lexicon-Based Graph Convolutional Network for Chinese Word Segmentation
Kaiyu Huang | Hao Yu | Junpeng Liu | Wei Liu | Jingxiang Cao | Degen Huang
Findings of the Association for Computational Linguistics: EMNLP 2021

Precise information of word boundary can alleviate the problem of lexical ambiguity to improve the performance of natural language processing (NLP) tasks. Thus, Chinese word segmentation (CWS) is a fundamental task in NLP. Due to the development of pre-trained language models (PLM), pre-trained knowledge can help neural methods solve the main problems of the CWS in significant measure. Existing methods have already achieved high performance on several benchmarks (e.g., Bakeoff-2005). However, recent outstanding studies are limited by the small-scale annotated corpus. To further improve the performance of CWS methods based on fine-tuning the PLMs, we propose a novel neural framework, LBGCN, which incorporates a lexicon-based graph convolutional network into the Transformer encoder. Experimental results on five benchmarks and four cross-domain datasets show the lexicon-based graph convolutional network successfully captures the information of candidate words and helps to improve performance on the benchmarks (Bakeoff-2005 and CTB6) and the cross-domain datasets (SIGHAN-2010). Further experiments and analyses demonstrate that our proposed framework effectively models the lexicon to enhance the ability of basic neural frameworks and strengthens the robustness in the cross-domain scenario.

pdf bib
DUTNLP Machine Translation System for WMT21 Triangular Translation Task
Huan Liu | Junpeng Liu | Kaiyu Huang | Degen Huang
Proceedings of the Sixth Conference on Machine Translation

This paper describes DUT-NLP Lab’s submission to the WMT-21 triangular machine translation shared task. The participants are not allowed to use other data and the translation direction of this task is Russian-to-Chinese. In this task, we use the Transformer as our baseline model, and integrate several techniques to enhance the performance of the baseline, including data filtering, data selection, fine-tuning, and post-editing. Further, to make use of the English resources, such as Russian/English and Chinese/English parallel data, the relationship triangle is constructed by multilingual neural machine translation systems. As a result, our submission achieves a BLEU score of 21.9 in Russian-to-Chinese.

pdf bib
Towards User-Driven Neural Machine Translation
Huan Lin | Liang Yao | Baosong Yang | Dayiheng Liu | Haibo Zhang | Weihua Luo | Degen Huang | Jinsong Su
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

A good translation should not only translate the original content semantically, but also incarnate personal traits of the original text. For a real-world neural machine translation (NMT) system, these user traits (e.g., topic preference, stylistic characteristics and expression habits) can be preserved in user behavior (e.g., historical inputs). However, current NMT systems marginally consider the user behavior due to: 1) the difficulty of modeling user portraits in zero-shot scenarios, and 2) the lack of user-behavior annotated parallel dataset. To fill this gap, we introduce a novel framework called user-driven NMT. Specifically, a cache-based module and a user-driven contrastive learning method are proposed to offer NMT the ability to capture potential user traits from their historical inputs under a zero-shot learning fashion. Furthermore, we contribute the first Chinese-English parallel corpus annotated with user behavior called UDT-Corpus. Experimental results confirm that the proposed user-driven NMT can generate user-specific translations.

pdf bib
Exploring Dynamic Selection of Branch Expansion Orders for Code Generation
Hui Jiang | Chulun Zhou | Fandong Meng | Biao Zhang | Jie Zhou | Degen Huang | Qingqiang Wu | Jinsong Su
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Due to the great potential in facilitating software development, code generation has attracted increasing attention recently. Generally, dominant models are Seq2Tree models, which convert the input natural language description into a sequence of tree-construction actions corresponding to the pre-order traversal of an Abstract Syntax Tree (AST). However, such a traversal order may not be suitable for handling all multi-branch nodes. In this paper, we propose to equip the Seq2Tree model with a context-based Branch Selector, which is able to dynamically determine optimal expansion orders of branches for multi-branch nodes. Particularly, since the selection of expansion orders is a non-differentiable multi-step operation, we optimize the selector through reinforcement learning, and formulate the reward function as the difference of model losses obtained through different expansion orders. Experimental results and in-depth analysis on several commonly-used datasets demonstrate the effectiveness and generality of our approach. We have released our code at https://github.com/DeepLearnXMU/CG-RL.

pdf bib
Improving Graph-based Sentence Ordering with Iteratively Predicted Pairwise Orderings
Shaopeng Lai | Ante Wang | Fandong Meng | Jie Zhou | Yubin Ge | Jiali Zeng | Junfeng Yao | Degen Huang | Jinsong Su
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Dominant sentence ordering models can be classified into pairwise ordering models and set-to-sequence models. However, there is little attempt to combine these two types of models, which inituitively possess complementary advantages. In this paper, we propose a novel sentence ordering framework which introduces two classifiers to make better use of pairwise orderings for graph-based sentence ordering (Yin et al. 2019, 2021). Specially, given an initial sentence-entity graph, we first introduce a graph-based classifier to predict pairwise orderings between linked sentences. Then, in an iterative manner, based on the graph updated by previously predicted high-confident pairwise orderings, another classifier is used to predict the remaining uncertain pairwise orderings. At last, we adapt a GRN-based sentence ordering model (Yin et al. 2019, 2021) on the basis of final graph. Experiments on five commonly-used datasets demonstrate the effectiveness and generality of our model. Particularly, when equipped with BERT (Devlin et al. 2019) and FHDecoder (Yin et al. 2020), our model achieves state-of-the-art performance. Our code is available at https://github.com/DeepLearnXMU/IRSEG.

2020

pdf bib
A Joint Multiple Criteria Model in Transfer Learning for Cross-domain Chinese Word Segmentation
Kaiyu Huang | Degen Huang | Zhuang Liu | Fengran Mo
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Word-level information is important in natural language processing (NLP), especially for the Chinese language due to its high linguistic complexity. Chinese word segmentation (CWS) is an essential task for Chinese downstream NLP tasks. Existing methods have already achieved a competitive performance for CWS on large-scale annotated corpora. However, the accuracy of the method will drop dramatically when it handles an unsegmented text with lots of out-of-vocabulary (OOV) words. In addition, there are many different segmentation criteria for addressing different requirements of downstream NLP tasks. Excessive amounts of models with saving different criteria will generate the explosive growth of the total parameters. To this end, we propose a joint multiple criteria model that shares all parameters to integrate different segmentation criteria into one model. Besides, we utilize a transfer learning method to improve the performance of OOV words. Our proposed method is evaluated by designing comprehensive experiments on multiple benchmark datasets (e.g., Bakeoff 2005, Bakeoff 2008 and SIGHAN 2010). Our method achieves the state-of-the-art performances on all datasets. Importantly, our method also shows a competitive practicability and generalization ability for the CWS task.

pdf bib
Context-Aware Word Segmentation for Chinese Real-World Discourse
Kaiyu Huang | Junpeng Liu | Jingxiang Cao | Degen Huang
Proceedings of the Second International Workshop of Discourse Processing

Previous neural approaches achieve significant progress for Chinese word segmentation (CWS) as a sentence-level task, but it suffers from limitations on real-world scenario. In this paper, we address this issue with a context-aware method and optimize the solution at document-level. This paper proposes a three-step strategy to improve the performance for discourse CWS. First, the method utilizes an auxiliary segmenter to remedy the limitation on pre-segmenter. Then the context-aware algorithm computes the confidence of each split. The maximum probability path is reconstructed via this algorithm. Besides, in order to evaluate the performance in discourse, we build a new benchmark consisting of the latest news and Chinese medical articles. Extensive experiments on this benchmark show that our proposed method achieves a competitive performance on a document-level real-world scenario for CWS.

2016

pdf bib
Research on attention memory networks as a model for learning natural language inference
Zhuang Liu | Degen Huang | Jing Zhang | Kaiyu Huang
Proceedings of the Workshop on Structured Prediction for NLP

2015

pdf bib
Learning Bilingual Sentiment Word Embeddings for Cross-language Sentiment Classification
HuiWei Zhou | Long Chen | Fulin Shi | Degen Huang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2013

pdf bib
Improving Feature-Based Biomedical Event Extraction System by Integrating Argument Information
Lishuang Li | Yiwen Wang | Degen Huang
Proceedings of the BioNLP Shared Task 2013 Workshop

2012

pdf bib
Rules-based Chinese Word Segmentation on MicroBlog for CIPS-SIGHAN on CLP2012
Jing Zhang | Degen Huang | Xia Han | Wei Wang
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

2011

pdf bib
POS Tagging of English Particles for Machine Translation
Jianjun Ma | Degen Huang | Haixia Liu | Wenfeng Sheng
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Combining Syntactic and Semantic Features by SVM for Unrestricted Coreference Resolution
Huiwei Zhou | Yao Li | Degen Huang | Yan Zhang | Chunlong Wu | Yuansheng Yang
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

2010

pdf bib
Mining Large-scale Comparable Corpora from Chinese-English News Collections
Degen Huang | Lian Zhao | Lishuang Li | Haitao Yu
Coling 2010: Posters

pdf bib
Exploiting Multi-Features to Detect Hedges and their Scope in Biomedical Texts
Huiwei Zhou | Xiaoyan Li | Degen Huang | Zezhong Li | Yuansheng Yang
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf bib
HMM Revises Low Marginal Probability by CRF for Chinese Word Segmentation
Degen Huang | Deqin Tong | Yanyan Luo
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
DLUT: Chinese Personal Name Disambiguation with Rich Features
Dongliang Wang | Degen Huang
CIPS-SIGHAN Joint Conference on Chinese Language Processing

2008

pdf bib
HMM and CRF Based Hybrid Model for Chinese Lexical Analysis
Degen Huang | Xiao Sun | Shidou Jiao | Lishuang Li | Zhuoye Ding | Ru Wan
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

2006

pdf bib
Hybrid Models for Chinese Named Entity Recognition
Lishuang Li | Tingting Mao | Degen Huang | Yuansheng Yang
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

2005

pdf bib
Chinese Main Verb Identification: From Specification to Realization
Bing-Gong Ding | Chang-Ning Huang | De-Gen Huang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 1, March 2005