Xuan Liu


pdf bib
Cross-lingual Text Classification with Heterogeneous Graph Neural Network
Ziyun Wang | Xuan Liu | Peiji Yang | Shixing Liu | Zhisheng Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Cross-lingual text classification aims at training a classifier on the source language and transferring the knowledge to target languages, which is very useful for low-resource languages. Recent multilingual pretrained language models (mPLM) achieve impressive results in cross-lingual classification tasks, but rarely consider factors beyond semantic similarity, causing performance degradation between some language pairs. In this paper we propose a simple yet effective method to incorporate heterogeneous information within and across languages for cross-lingual text classification using graph convolutional networks (GCN). In particular, we construct a heterogeneous graph by treating documents and words as nodes, and linking nodes with different relations, which include part-of-speech roles, semantic similarity, and document translations. Extensive experiments show that our graph-based method significantly outperforms state-of-the-art models on all tasks, and also achieves consistent performance gain over baselines in low-resource settings where external tools like translators are unavailable.


pdf bib
DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications
Wei He | Kai Liu | Jing Liu | Yajuan Lyu | Shiqi Zhao | Xinyan Xiao | Yuan Liu | Yizhong Wang | Hua Wu | Qiaoqiao She | Xuan Liu | Tian Wu | Haifeng Wang
Proceedings of the Workshop on Machine Reading for Question Answering

This paper introduces DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, designed to address real-world MRC. DuReader has three advantages over previous MRC datasets: (1) data sources: questions and documents are based on Baidu Search and Baidu Zhidao; answers are manually generated. (2) question types: it provides rich annotations for more question types, especially yes-no and opinion questions, that leaves more opportunity for the research community. (3) scale: it contains 200K questions, 420K answers and 1M documents; it is the largest Chinese MRC dataset so far. Experiments show that human performance is well above current state-of-the-art baseline systems, leaving plenty of room for the community to make improvements. To help the community make these improvements, both DuReader and baseline systems have been posted online. We also organize a shared competition to encourage the exploration of more models. Since the release of the task, there are significant improvements over the baselines.

pdf bib
Binarized LSTM Language Model
Xuan Liu | Di Cao | Kai Yu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Long short-term memory (LSTM) language model (LM) has been widely investigated for automatic speech recognition (ASR) and natural language processing (NLP). Although excellent performance is obtained for large vocabulary tasks, tremendous memory consumption prohibits the use of LSTM LM in low-resource devices. The memory consumption mainly comes from the word embedding layer. In this paper, a novel binarized LSTM LM is proposed to address the problem. Words are encoded into binary vectors and other LSTM parameters are further binarized to achieve high memory compression. This is the first effort to investigate binary LSTM for large vocabulary LM. Experiments on both English and Chinese LM and ASR tasks showed that can achieve a compression ratio of 11.3 without any loss of LM and ASR performances and a compression ratio of 31.6 with acceptable minor performance degradation.


pdf bib
Multi-view Response Selection for Human-Computer Conversation
Xiangyang Zhou | Daxiang Dong | Hua Wu | Shiqi Zhao | Dianhai Yu | Hao Tian | Xuan Liu | Rui Yan
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing