Qipeng Guo


pdf bib
A Unified Generative Framework for Various NER Subtasks
Hang Yan | Tao Gui | Junqi Dai | Qipeng Guo | Zheng Zhang | Xipeng Qiu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Named Entity Recognition (NER) is the task of identifying spans that represent entities in sentences. Whether the entity spans are nested or discontinuous, the NER task can be categorized into the flat NER, nested NER, and discontinuous NER subtasks. These subtasks have been mainly solved by the token-level sequence labelling or span-level classification. However, these solutions can hardly tackle the three kinds of NER subtasks concurrently. To that end, we propose to formulate the NER subtasks as an entity span sequence generation task, which can be solved by a unified sequence-to-sequence (Seq2Seq) framework. Based on our unified framework, we can leverage the pre-trained Seq2Seq model to solve all three kinds of NER subtasks without the special design of the tagging schema or ways to enumerate spans. We exploit three types of entity representations to linearize entities into a sequence. Our proposed framework is easy-to-implement and achieves state-of-the-art (SoTA) or near SoTA performance on eight English NER datasets, including two flat NER datasets, three nested NER datasets, and three discontinuous NER datasets.


pdf bib
GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation
Zhijing Jin | Qipeng Guo | Xipeng Qiu | Zheng Zhang
Proceedings of the 28th International Conference on Computational Linguistics

Data collection for the knowledge graph-to-text generation is expensive. As a result, research on unsupervised models has emerged as an active field recently. However, most unsupervised models have to use non-parallel versions of existing small supervised datasets, which largely constrain their potential. In this paper, we propose a large-scale, general-domain dataset, GenWiki. Our unsupervised dataset has 1.3M text and graph examples, respectively. With a human-annotated test set, we provide this new benchmark dataset for future research on unsupervised text generation from knowledge graphs.

pdf bib
CoLAKE: Contextualized Language and Knowledge Embedding
Tianxiang Sun | Yunfan Shao | Xipeng Qiu | Qipeng Guo | Yaru Hu | Xuanjing Huang | Zheng Zhang
Proceedings of the 28th International Conference on Computational Linguistics

With the emerging branch of incorporating factual knowledge into pre-trained language models such as BERT, most existing models consider shallow, static, and separately pre-trained entity embeddings, which limits the performance gains of these models. Few works explore the potential of deep contextualized knowledge representation when injecting knowledge. In this paper, we propose the Contextualized Language and Knowledge Embedding (CoLAKE), which jointly learns contextualized representation for both language and knowledge with the extended MLM objective. Instead of injecting only entity embeddings, CoLAKE extracts the knowledge context of an entity from large-scale knowledge bases. To handle the heterogeneity of knowledge context and language context, we integrate them in a unified data structure, word-knowledge graph (WK graph). CoLAKE is pre-trained on large-scale WK graphs with the modified Transformer encoder. We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks. Experimental results show that CoLAKE outperforms previous counterparts on most of the tasks. Besides, CoLAKE achieves surprisingly high performance on our synthetic task called word-knowledge graph completion, which shows the superiority of simultaneously contextualizing language and knowledge representation.

pdf bib
CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via Cycle Training
Qipeng Guo | Zhijing Jin | Xipeng Qiu | Weinan Zhang | David Wipf | Zheng Zhang
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

Two important tasks at the intersection of knowledge graphs and natural language processing are graph-to-text (G2T) and text-tograph (T2G) conversion. Due to the difficulty and high cost of data collection, the supervised data available in the two fields are usually on the magnitude of tens of thousands, for example, 18K in the WebNLG 2017 dataset after preprocessing, which is far fewer than the millions of data for other tasks such as machine translation. Consequently, deep learning models for G2T and T2G suffer largely from scarce training data. We present CycleGT, an unsupervised training method that can bootstrap from fully non-parallel graph and text data, and iteratively back translate between the two forms. Experiments on WebNLG datasets show that our unsupervised model trained on the same number of data achieves performance on par with several fully supervised models. Further experiments on the non-parallel GenWiki dataset verify that our method performs the best among unsupervised baselines. This validates our framework as an effective approach to overcome the data scarcity problem in the fields of G2T and T2G.

pdf bib
𝒫2: A Plan-and-Pretrain Approach for Knowledge Graph-to-Text Generation
Qipeng Guo | Zhijing Jin | Ning Dai | Xipeng Qiu | Xiangyang Xue | David Wipf | Zheng Zhang
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

Text verbalization of knowledge graphs is an important problem with wide application to natural language generation (NLG) systems. It is challenging because the generated text not only needs to be grammatically correct (fluency), but also has to contain the given structured knowledge input (relevance) and meet some other criteria. We develop a plan-and-pretrain approach, 𝒫2, which consists of a relational graph convolutional network (RGCN) planner and the pretrained sequence-tosequence (Seq2Seq) model T5. Specifically, the R-GCN planner first generates an order of the knowledge graph triplets, corresponding to the order that they will be mentioned in text, and then T5 produces the surface realization of the given plan. In the WebNLG+ 2020 Challenge, our submission ranked in 1st place on all automatic and human evaluation criteria of the English RDF-to-text generation task.

pdf bib
BERT-ATTACK: Adversarial Attack Against BERT Using BERT
Linyang Li | Ruotian Ma | Qipeng Guo | Xiangyang Xue | Xipeng Qiu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Adversarial attacks for discrete data (such as texts) have been proved significantly more challenging than continuous data (such as images) since it is difficult to generate adversarial samples with gradient-based methods. Current successful attack methods for texts usually adopt heuristic replacement strategies on the character or word level, which remains challenging to find the optimal solution in the massive space of possible combinations of replacements while preserving semantic consistency and language fluency. In this paper, we propose BERT-Attack, a high-quality and effective method to generate adversarial samples using pre-trained masked language models exemplified by BERT. We turn BERT against its fine-tuned models and other deep neural models in downstream tasks so that we can successfully mislead the target models to predict incorrectly. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage, while the generated adversarial samples are fluent and semantically preserved. Also, the cost of calculation is low, thus possible for large-scale generations. The code is available at https://github.com/LinyangLee/BERT-Attack.


pdf bib
Qipeng Guo | Xipeng Qiu | Pengfei Liu | Yunfan Shao | Xiangyang Xue | Zheng Zhang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Although Transformer has achieved great successes on many NLP tasks, its heavy structure with fully-connected attention connections leads to dependencies on large training data. In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. To reduce model complexity, we replace the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node. Thus, complexity is reduced from quadratic to linear, while preserving the capacity to capture both local composition and long-range dependency. The experiments on four tasks (22 datasets) show that Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.