Ying Liu


2021

pdf bib
K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce
Song Xu | Haoran Li | Peng Yuan | Yujia Wang | Youzheng Wu | Xiaodong He | Ying Liu | Bowen Zhou
Findings of the Association for Computational Linguistics: EMNLP 2021

Existing pre-trained language models (PLMs) have demonstrated the effectiveness of self-supervised learning for a broad range of natural language processing (NLP) tasks. However, most of them are not explicitly aware of domain-specific knowledge, which is essential for downstream tasks in many domains, such as tasks in e-commerce scenarios. In this paper, we propose K-PLUG, a knowledge-injected pre-trained language model based on the encoder-decoder transformer that can be transferred to both natural language understanding and generation tasks. Specifically, we propose five knowledge-aware self-supervised pre-training objectives to formulate the learning of domain-specific knowledge, including e-commerce domain-specific knowledge-bases, aspects of product entities, categories of product entities, and unique selling propositions of product entities. We verify our method in a diverse range of e-commerce scenarios that require domain-specific knowledge, including product knowledge base completion, abstractive product summarization, and multi-turn dialogue. K-PLUG significantly outperforms baselines across the board, which demonstrates that the proposed method effectively learns a diverse set of domain-specific knowledge for both language understanding and generation tasks. Our code is available.

pdf bib
SaGE: 基于句法感知图卷积神经网络和ELECTRA的中文隐喻识别模型(SaGE: Syntax-aware GCN with ELECTRA for Chinese Metaphor Detection)
Shenglong Zhang (张声龙) | Ying Liu (刘颖) | Yanjun Ma (马艳军)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

“隐喻是人类语言中经常出现的一种特殊现象,隐喻识别对于自然语言处理各项任务来说具有十分基础和重要的意义。针对中文领域的隐喻识别任务,我们提出了一种基于句法感知图卷积神经网络和ELECTRA的隐喻识别模型(Syntax-aware GCN withELECTRA SaGE)。该模型从语言学出发,使用ELECTRA和Transformer编码器抽取句子的语义特征,将句子按照依存关系组织成一张图并使用图卷积神经网络抽取其句法特征,在此基础上对两类特征进行融合以进行隐喻识别。我们的模型在CCL2018中文隐喻识别评测数据集上以85.22%的宏平均F1分数超越了此前的最佳成绩,验证了融合语义信息和句法信息对于隐喻识别任务具有重要作用。”

pdf bib
Native Language Identification and Reconstruction of Native Language Relationship Using Japanese Learner Corpus
Mitsuhiro Nishijima | Ying Liu
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

2020

pdf bib
Modularized Syntactic Neural Networks for Sentence Classification
Haiyan Wu | Ying Liu | Shaoyun Shi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

This paper focuses on tree-based modeling for the sentence classification task. In existing works, aggregating on a syntax tree usually considers local information of sub-trees. In contrast, in addition to the local information, our proposed Modularized Syntactic Neural Network (MSNN) utilizes the syntax category labels and takes advantage of the global context while modeling sub-trees. In MSNN, each node of a syntax tree is modeled by a label-related syntax module. Each syntax module aggregates the outputs of lower-level modules, and finally, the root module provides the sentence representation. We design a tree-parallel mini-batch strategy for efficient training and predicting. Experimental results on four benchmark datasets show that our MSNN significantly outperforms previous state-of-the-art tree-based methods on the sentence classification task.

pdf bib
用计量风格学方法考察《水浒传》的作者争议问题——以罗贯中《平妖传》为参照(Quantitive Stylistics Based Research on the Controversy of the Author of “Tales of the Marshes”: Comparing with “Pingyaozhuan” of Luo Guanzhong)
Li Song (宋丽) | Ying Liu (刘颖)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

《水浒传》是独著还是合著,施耐庵和罗贯中是何关系一直存在争议。本文将其作者争议粗略归纳为施耐庵作、罗贯中作、施作罗续、罗作他续、施作罗改五种情况,以罗贯中的《平妖传》为参照,用假设检验、文本聚类、文本分类、波动风格计量等方法,结合对文本内容的分析,考察《水浒传》的写作风格,试图为其作者身份认定提供参考。结果显示,只有罗作他续的可能性大,即前70回为罗贯中所作,后由他人续写,其他四种情况可能性都较小。

2019

pdf bib
Relation Extraction with Temporal Reasoning Based on Memory Augmented Distant Supervision
Jianhao Yan | Lin He | Ruqin Huang | Jian Li | Ying Liu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Distant supervision (DS) is an important paradigm for automatically extracting relations. It utilizes existing knowledge base to collect examples for the relation we intend to extract, and then uses these examples to automatically generate the training data. However, the examples collected can be very noisy, and pose significant challenge for obtaining high quality labels. Previous work has made remarkable progress in predicting the relation from distant supervision, but typically ignores the temporal relations among those supervising instances. This paper formulates the problem of relation extraction with temporal reasoning and proposes a solution to predict whether two given entities participate in a relation at a given time spot. For this purpose, we construct a dataset called WIKI-TIME which additionally includes the valid period of a certain relation of two entities in the knowledge base. We propose a novel neural model to incorporate both the temporal information encoding and sequential reasoning. The experimental results show that, compared with the best of existing models, our model achieves better performance in both WIKI-TIME dataset and the well-studied NYT-10 dataset.

2015

pdf bib
A Corpus-Based Study of zunshou and Its English Equivalents
Ying Liu
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2014

pdf bib
A Corpus-Based Quantitative Study of Nominalizations across Chinese and British Media English
Ying Liu | Alex Chengyu Fang | Naixing Wei
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

2013

pdf bib
UMLS::Similarity: Measuring the Relatedness and Similarity of Biomedical Concepts
Bridget McInnes | Ted Pedersen | Serguei Pakhomov | Ying Liu | Genevieve Melton-Meaux
Proceedings of the 2013 NAACL HLT Demonstration Session

2011

pdf bib
Using Second-order Vectors in a Knowledge-based Method for Acronym Disambiguation
Bridget T. McInnes | Ted Pedersen | Ying Liu | Serguei V. Pakhomov | Genevieve B. Melton
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

pdf bib
The Ngram Statistics Package (Text::NSP) : A Flexible Tool for Identifying Ngrams, Collocations, and Word Associations
Ted Pedersen | Satanjeev Banerjee | Bridget McInnes | Saiyam Kohli | Mahesh Joshi | Ying Liu
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World