Jinan Xu

Also published as: JinAn Xu


2021

pdf bib
基于多任务标签一致性机制的中文命名实体识别(Chinese Named Entity Recognition based on Multi-task Label Consistency Mechanism)
Shuning Lv (吕书宁) | Jian Liu (刘健) | Jinan Xu (徐金安) | Yufeng Chen (陈钰枫) | Yujie Zhang (张玉洁)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

“实体边界预测对中文命名实体识别至关重要。现有研究为改善边界识别效果提出的多任务学习方法仅考虑与分词任务结合,缺少多任务标签训练数据,无法学到任务的标签一致性关系。本文提出一种新的基于多任务标签一致性机制的中文命名实体识别方法:将分词和词性信息融入命名实体识别模型,使三种任务联合训练;建立基于标签一致性机制的多任务学习模式,来捕获标签一致性关系及学习多任务表示。全样本和小样本实验表明了方法的有效性。”

pdf bib
融合外部知识的开放域复述模板获取方法(An Open Domain Paraphrasing Template Acquisition Method Based on External Knowledge)
Bo Jin (金波) | Mingtong Liu (刘明童) | Yujie Zhang (张玉洁) | Jinan Xu (徐金安) | Yufeng Chen (陈钰枫)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

“如何挖掘语言资源中丰富的复述模板,是复述研究中的一项重要任务。已有方法在人工给定种子实体对的基础上,利用实体关系,通过自举迭代方式,从开放域获取复述模板,规避对平行语料或可比语料的依赖,但是该方法需人工给定实体对,实体关系受限;在迭代过程中语义会发生偏移,影响获取质量。针对这些问题,我们考虑知识库中包含描述特定语义关系的实体对(即关系三元组),提出融合外部知识的开放域复述模板自动获取方法。首先,将关系三元组与开放域文本对齐,获取关系对应文本,并将文本中语义丰富部分泛化成变量槽,获取关系模板;接着设计模板表示方法,本文利用预训练语言模型,在模板表示中融合变量槽语义;最后,根据获得的模板表示,设计自动聚类与筛选方法,获取高精度的复述模板。在融合自动评测与人工评测的评价方法下,实验结果表明,本文提出的方法实现了在开放域数据上复述模板的自动泛化与获取,能够获得质量高、语义一致的复述模板。”

pdf bib
Modeling Bilingual Conversational Characteristics for Neural Chat Translation
Yunlong Liang | Fandong Meng | Yufeng Chen | Jinan Xu | Jie Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Neural chat translation aims to translate bilingual conversational text, which has a broad application in international exchanges and cooperation. Despite the impressive performance of sentence-level and context-aware Neural Machine Translation (NMT), there still remain challenges to translate bilingual conversational text due to its inherent characteristics such as role preference, dialogue coherence, and translation consistency. In this paper, we aim to promote the translation quality of conversational text by modeling the above properties. Specifically, we design three latent variational modules to learn the distributions of bilingual conversational characteristics. Through sampling from these learned distributions, the latent variables, tailored for role preference, dialogue coherence, and translation consistency, are incorporated into the NMT model for better translation. We evaluate our approach on the benchmark dataset BConTrasT (English<->German) and a self-collected bilingual dialogue corpus, named BMELD (English<->Chinese). Extensive experiments show that our approach notably boosts the performance over strong baselines by a large margin and significantly surpasses some state-of-the-art context-aware NMT models in terms of BLEU and TER. Additionally, we make the BMELD dataset publicly available for the research community.

pdf bib
Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation
Yangyifan Xu | Yijin Liu | Fandong Meng | Jiajun Zhang | Jinan Xu | Jie Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Recently, token-level adaptive training has achieved promising improvement in machine translation, where the cross-entropy loss function is adjusted by assigning different training weights to different tokens, in order to alleviate the token imbalance problem. However, previous approaches only use static word frequency information in the target language without considering the source language, which is insufficient for bilingual tasks like machine translation. In this paper, we propose a novel bilingual mutual information (BMI) based adaptive objective, which measures the learning difficulty for each target token from the perspective of bilingualism, and assigns an adaptive weight accordingly to improve token-level adaptive training. This method assigns larger training weights to tokens with higher BMI, so that easy tokens are updated with coarse granularity while difficult tokens are updated with fine granularity. Experimental results on WMT14 English-to-German and WMT19 Chinese-to-English demonstrate the superiority of our approach compared with the Transformer baseline and previous token-level adaptive training approaches. Further analyses confirm that our method can improve the lexical diversity.

pdf bib
Target-oriented Fine-tuning for Zero-Resource Named Entity Recognition
Ying Zhang | Fandong Meng | Yufeng Chen | Jinan Xu | Jie Zhou
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Confidence-Aware Scheduled Sampling for Neural Machine Translation
Yijin Liu | Fandong Meng | Yufeng Chen | Jinan Xu | Jie Zhou
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
A Learning-Exploring Method to Generate Diverse Paraphrases with Multi-Objective Deep Reinforcement Learning
Mingtong Liu | Erguang Yang | Deyi Xiong | Yujie Zhang | Yao Meng | Changjian Hu | Jinan Xu | Yufeng Chen
Proceedings of the 28th International Conference on Computational Linguistics

Paraphrase generation (PG) is of great importance to many downstream tasks in natural language processing. Diversity is an essential nature to PG for enhancing generalization capability and robustness of downstream applications. Recently, neural sequence-to-sequence (Seq2Seq) models have shown promising results in PG. However, traditional model training for PG focuses on optimizing model prediction against single reference and employs cross-entropy loss, which objective is unable to encourage model to generate diverse paraphrases. In this work, we present a novel approach with multi-objective learning to PG. We propose a learning-exploring method to generate sentences as learning objectives from the learned data distribution, and employ reinforcement learning to combine these new learning objectives for model training. We first design a sample-based algorithm to explore diverse sentences. Then we introduce several reward functions to evaluate the sampled sentences as learning signals in terms of expressive diversity and semantic fidelity, aiming to generate diverse and high-quality paraphrases. To effectively optimize model performance satisfying different evaluating aspects, we use a GradNorm-based algorithm that automatically balances these training objectives. Experiments and analyses on Quora and Twitter datasets demonstrate that our proposed method not only gains a significant increase in diversity but also improves generation quality over several state-of-the-art baselines.

pdf bib
Multi-view Classification Model for Knowledge Graph Completion
Wenbin Jiang | Mengfei Guo | Yufeng Chen | Ying Li | Jinan Xu | Yajuan Lyu | Yong Zhu
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Most previous work on knowledge graph completion conducted single-view prediction or calculation for candidate triple evaluation, based only on the content information of the candidate triples. This paper describes a novel multi-view classification model for knowledge graph completion, where multiple classification views are performed based on both content and context information for candidate triple evaluation. Each classification view evaluates the validity of a candidate triple from a specific viewpoint, based on the content information inside the candidate triple and the context information nearby the triple. These classification views are implemented by a unified neural network and the classification predictions are weightedly integrated to obtain the final evaluation. Experiments show that, the multi-view model brings very significant improvements over previous methods, and achieves the new state-of-the-art on two representative datasets. We believe that, the flexibility and the scalability of the multi-view classification model facilitates the introduction of additional information and resources for better performance.

pdf bib
基于图神经网络的汉语依存分析和语义组合计算联合模型(Joint Learning Chinese Dependency Parsing and Semantic Composition based on Graph Neural Network)
Kai Wang (汪凯) | Mingtong Liu (刘明童) | Yuanmeng Chen (陈圆梦) | Yujie Zhang (张玉洁) | Jinan Xu (徐金安) | Yufeng Chen (陈钰枫)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

组合原则表明句子的语义由其构成成分的语义按照一定规则组合而成, 由此基于句法结构的语义组合计算一直是一个重要的探索方向,其中采用树结构的组合计算方法最具有代表性。但是该方法难以应用于大规模数据处理,主要问题是其语义组合的顺序依赖于具体树的结构,无法实现并行处理。本文提出一种基于图的依存句法分析和语义组合计算的联合框架,并借助复述识别任务训练语义组合模型和句法分析模型。一方面图模型可以在训练和预测阶段采用并行处理,极大缩短计算时间;另一方面联合句法分析的语义组合框架不必依赖外部句法分析器,同时两个任务的联合学习可使语义表示同时学习句法结构和语义的上下文信息。我们在公开汉语复述识别数据集LCQMC上进行评测,实验结果显示准确率接近树结构组合方法,达到79.54%,而预测速度提升高达30倍。

pdf bib
联合依存分析的汉语语义组合模型(Chinese Semantic Composition Model with Dependency Parsing)
Yuanmeng Chen (陈圆梦) | Yujie Zhang (张玉洁) | Jinan Xu (徐金安) | Yufeng Chen (陈钰枫)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

在语义组合方法中,结构化方法强调以结构信息指导词义表示的组合方式。现有结构化语义组合方法使用外部分析器获取句法结构信息,导致句法分析与语义组合相互割裂,句法分析的精度严重制约语义组合模型的性能,且训练数据领域不一致等问题会进一步加剧性能的下降。对此,本文提出联合依存分析的语义组合模型,将依存分析与语义组合进行联合,一方面在训练语义组合模型时对依存分析模型进行微调,使其能够更适应语义组合模型使用的训练数据的领域特点;另一方面,在语义组合部分加入依存分析的中间信息表示,获取更丰富的结构信息和语义信息,以此来降低语义组合模型对依存分析错误结果的敏感度,提升模型的鲁棒性。我们以汉语为具体研究对象,将语义组合模型用于复述识别任务,并在CTB5汉语依存分析数据和LCQMC汉语复述识别数据上验证本文提出的模型。实验结果显示,本文所提方法在复述识别任务上的预测正确率和F1值上分别达到76.81%和78.03%;我们进一步设计实验对联合学习和中间信息利用的有效性进行验证,并与相关代表性工作进行了对比分析。

pdf bib
A Joint Model for Graph-based Chinese Dependency Parsing
Xingchen Li | Mingtong Liu | Yujie Zhang | Jinan Xu | Yufeng Chen
Proceedings of the 19th Chinese National Conference on Computational Linguistics

In Chinese dependency parsing, the joint model of word segmentation, POS tagging and dependency parsing has become the mainstream framework because it can eliminate error propagation and share knowledge, where the transition-based model with feature templates maintains the best performance. Recently, the graph-based joint model (Yan et al., 2019) on word segmentation and dependency parsing has achieved better performance, demonstrating the advantages of the graph-based models. However, this work can not provide POS information for downstream tasks, and the POS tagging task was proved to be helpful to the dependency parsing according to the research of the transition-based model. Therefore, we propose a graph-based joint model for Chinese word segmentation, POS tagging and dependency parsing. We designed a charater-level POS tagging task, and then train it jointly with the model of Yan et al. (2019). We adopt two methods of joint POS tagging task, one is by sharing parameters, the other is by using tag attention mechanism, which enables the three tasks to better share intermediate information and improve each other’s performance. The experimental results on the Penn Chinese treebank (CTB5) show that our proposed joint model improved by 0.38% on dependency parsing than the model of Yan et al. (2019). Compared with the best transition-based joint model, our model improved by 0.18%, 0.35% and 5.99% respectively in terms of word segmentation, POS tagging and dependency parsing.

2019

pdf bib
CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding
Yijin Liu | Fandong Meng | Jinchao Zhang | Jie Zhou | Yufeng Chen | Jinan Xu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Spoken Language Understanding (SLU) mainly involves two tasks, intent detection and slot filling, which are generally modeled jointly in existing works. However, most existing models fail to fully utilize cooccurrence relations between slots and intents, which restricts their potential performance. To address this issue, in this paper we propose a novel Collaborative Memory Network (CM-Net) based on the well-designed block, named CM-block. The CM-block firstly captures slot-specific and intent-specific features from memories in a collaborative manner, and then uses these enriched features to enhance local context representations, based on which the sequential information flow leads to more specific (slot and intent) global utterance representations. Through stacking multiple CM-blocks, our CM-Net is able to alternately perform information exchange among specific memories, local contexts and the global utterance, and thus incrementally enriches each other. We evaluate the CM-Net on two standard benchmarks (ATIS and SNIPS) and a self-collected corpus (CAIS). Experimental results show that the CM-Net achieves the state-of-the-art results on the ATIS and SNIPS in most of criteria, and significantly outperforms the baseline models on the CAIS. Additionally, we make the CAIS dataset publicly available for the research community.

pdf bib
Original Semantics-Oriented Attention and Deep Fusion Network for Sentence Matching
Mingtong Liu | Yujie Zhang | Jinan Xu | Yufeng Chen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Sentence matching is a key issue in natural language inference and paraphrase identification. Despite the recent progress on multi-layered neural network with cross sentence attention, one sentence learns attention to the intermediate representations of another sentence, which are propagated from preceding layers and therefore are uncertain and unstable for matching, particularly at the risk of error propagation. In this paper, we present an original semantics-oriented attention and deep fusion network (OSOA-DFN) for sentence matching. Unlike existing models, each attention layer of OSOA-DFN is oriented to the original semantic representation of another sentence, which captures the relevant information from a fixed matching target. The multiple attention layers allow one sentence to repeatedly read the important information of another sentence for better matching. We then additionally design deep fusion to propagate the attention information at each matching layer. At last, we introduce a self-attention mechanism to capture global context to enhance attention-aware representation within each sentence. Experiment results on three sentence matching benchmark datasets SNLI, SciTail and Quora show that OSOA-DFN has the ability to model sentence matching more precisely.

pdf bib
A Novel Aspect-Guided Deep Transition Model for Aspect Based Sentiment Analysis
Yunlong Liang | Fandong Meng | Jinchao Zhang | Jinan Xu | Yufeng Chen | Jie Zhou
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Aspect based sentiment analysis (ABSA) aims to identify the sentiment polarity towards the given aspect in a sentence, while previous models typically exploit an aspect-independent (weakly associative) encoder for sentence representation generation. In this paper, we propose a novel Aspect-Guided Deep Transition model, named AGDT, which utilizes the given aspect to guide the sentence encoding from scratch with the specially-designed deep transition architecture. Furthermore, an aspect-oriented objective is designed to enforce AGDT to reconstruct the given aspect with the generated sentence representation. In doing so, our AGDT can accurately generate aspect-specific sentence representation, and thus conduct more accurate sentiment predictions. Experimental results on multiple SemEval datasets demonstrate the effectiveness of our proposed approach, which significantly outperforms the best reported results with the same setting.

pdf bib
GCDT: A Global Context Enhanced Deep Transition Architecture for Sequence Labeling
Yijin Liu | Fandong Meng | Jinchao Zhang | Jinan Xu | Yufeng Chen | Jie Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Current state-of-the-art systems for sequence labeling are typically based on the family of Recurrent Neural Networks (RNNs). However, the shallow connections between consecutive hidden states of RNNs and insufficient modeling of global information restrict the potential performance of those models. In this paper, we try to address these issues, and thus propose a Global Context enhanced Deep Transition architecture for sequence labeling named GCDT. We deepen the state transition path at each position in a sentence, and further assign every token with a global representation learned from the entire sentence. Experiments on two standard sequence labeling tasks show that, given only training data and the ubiquitous word embeddings (Glove), our GCDT achieves 91.96 F1 on the CoNLL03 NER task and 95.43 F1 on the CoNLL2000 Chunking task, which outperforms the best reported results under the same settings. Furthermore, by leveraging BERT as an additional resource, we establish new state-of-the-art results with 93.47 F1 on NER and 97.30 F1 on Chunking.

2016

pdf bib
Automatic Cross-Lingual Similarization of Dependency Grammars for Tree-based Machine Translation
Wenbin Jiang | Wen Zhang | Jinan Xu | Rangjia Cai
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
System Description of bjtu_nlp Neural Machine Translation System
Shaotong Li | JinAn Xu | Yufeng Chen | Yujie Zhang
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper presents our machine translation system that developed for the WAT2016 evalua-tion tasks of ja-en, ja-zh, en-ja, zh-ja, JPCja-en, JPCja-zh, JPCen-ja, JPCzh-ja. We build our system based on encoder–decoder framework by integrating recurrent neural network (RNN) and gate recurrent unit (GRU), and we also adopt an attention mechanism for solving the problem of information loss. Additionally, we propose a simple translation-specific approach to resolve the unknown word translation problem. Experimental results show that our system performs better than the baseline statistical machine translation (SMT) systems in each task. Moreover, it shows that our proposed approach of unknown word translation performs effec-tively improvement of translation results.

2015

pdf bib
Integrating Case Frame into Japanese to Chinese Hierarchical Phrase-based Translation Model
Jinan Xu | Jiangming Liu | Yufeng Chen | Yujie Zhang | Fang Ming | Shaotong Li
Proceedings of the 1st Workshop on Semantics-Driven Statistical Machine Translation (S2MT 2015)

pdf bib
A Hybrid Transliteration Model for Chinese/English Named Entities —BJTU-NLP Report for the 5th Named Entities Workshop
Dandan Wang | Xiaohui Yang | Jinan Xu | Yufeng Chen | Nan Wang | Bojia Liu | Jian Yang | Yujie Zhang
Proceedings of the Fifth Named Entity Workshop

2014

pdf bib
System Description: Dependency-based Pre-ordering for Japanese-Chinese Machine Translation
Jingsheng Cai | Yujie Zhang | Hua Shan | Jinan Xu
Proceedings of the 1st Workshop on Asian Translation (WAT2014)

pdf bib
Augment Dependency-to-String Translation with Fixed and Floating Structures
Jun Xie | Jinan Xu | Qun Liu
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
An Approach of Hybrid Hierarchical Structure for Word Similarity Computing by HowNet
Jiangming Liu | Jinan Xu | Yujie Zhang
Proceedings of the Sixth International Joint Conference on Natural Language Processing