Junpeng Liu


2021

pdf bib
DUTNLP Machine Translation System for WMT21 Triangular Translation Task
Huan Liu | Junpeng Liu | Kaiyu Huang | Degen Huang
Proceedings of the Sixth Conference on Machine Translation

This paper describes DUT-NLP Lab’s submission to the WMT-21 triangular machine translation shared task. The participants are not allowed to use other data and the translation direction of this task is Russian-to-Chinese. In this task, we use the Transformer as our baseline model, and integrate several techniques to enhance the performance of the baseline, including data filtering, data selection, fine-tuning, and post-editing. Further, to make use of the English resources, such as Russian/English and Chinese/English parallel data, the relationship triangle is constructed by multilingual neural machine translation systems. As a result, our submission achieves a BLEU score of 21.9 in Russian-to-Chinese.

pdf bib
Enhancing Chinese Word Segmentation via Pseudo Labels for Practicability
Kaiyu Huang | Junpeng Liu | Degen Huang | Deyi Xiong | Zhuang Liu | Jinsong Su
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Topic-Aware Contrastive Learning for Abstractive Dialogue Summarization
Junpeng Liu | Yanyan Zou | Hainan Zhang | Hongshen Chen | Zhuoye Ding | Caixia Yuan | Xiaojie Wang
Findings of the Association for Computational Linguistics: EMNLP 2021

Unlike well-structured text, such as news reports and encyclopedia articles, dialogue content often comes from two or more interlocutors, exchanging information with each other. In such a scenario, the topic of a conversation can vary upon progression and the key information for a certain topic is often scattered across multiple utterances of different speakers, which poses challenges to abstractly summarize dialogues. To capture the various topic information of a conversation and outline salient facts for the captured topics, this work proposes two topic-aware contrastive learning objectives, namely coherence detection and sub-summary generation objectives, which are expected to implicitly model the topic change and handle information scattering challenges for the dialogue summarization task. The proposed contrastive objectives are framed as auxiliary tasks for the primary dialogue summarization task, united via an alternative parameter updating strategy. Extensive experiments on benchmark datasets demonstrate that the proposed simple method significantly outperforms strong baselines and achieves new state-of-the-art performance. The code and trained models are publicly available via .

pdf bib
Lexicon-Based Graph Convolutional Network for Chinese Word Segmentation
Kaiyu Huang | Hao Yu | Junpeng Liu | Wei Liu | Jingxiang Cao | Degen Huang
Findings of the Association for Computational Linguistics: EMNLP 2021

Precise information of word boundary can alleviate the problem of lexical ambiguity to improve the performance of natural language processing (NLP) tasks. Thus, Chinese word segmentation (CWS) is a fundamental task in NLP. Due to the development of pre-trained language models (PLM), pre-trained knowledge can help neural methods solve the main problems of the CWS in significant measure. Existing methods have already achieved high performance on several benchmarks (e.g., Bakeoff-2005). However, recent outstanding studies are limited by the small-scale annotated corpus. To further improve the performance of CWS methods based on fine-tuning the PLMs, we propose a novel neural framework, LBGCN, which incorporates a lexicon-based graph convolutional network into the Transformer encoder. Experimental results on five benchmarks and four cross-domain datasets show the lexicon-based graph convolutional network successfully captures the information of candidate words and helps to improve performance on the benchmarks (Bakeoff-2005 and CTB6) and the cross-domain datasets (SIGHAN-2010). Further experiments and analyses demonstrate that our proposed framework effectively models the lexicon to enhance the ability of basic neural frameworks and strengthens the robustness in the cross-domain scenario.

2020

pdf bib
Context-Aware Word Segmentation for Chinese Real-World Discourse
Kaiyu Huang | Junpeng Liu | Jingxiang Cao | Degen Huang
Proceedings of the Second International Workshop of Discourse Processing

Previous neural approaches achieve significant progress for Chinese word segmentation (CWS) as a sentence-level task, but it suffers from limitations on real-world scenario. In this paper, we address this issue with a context-aware method and optimize the solution at document-level. This paper proposes a three-step strategy to improve the performance for discourse CWS. First, the method utilizes an auxiliary segmenter to remedy the limitation on pre-segmenter. Then the context-aware algorithm computes the confidence of each split. The maximum probability path is reconstructed via this algorithm. Besides, in order to evaluate the performance in discourse, we build a new benchmark consisting of the latest news and Chinese medical articles. Extensive experiments on this benchmark show that our proposed method achieves a competitive performance on a document-level real-world scenario for CWS.