Zhiqing Sun


2023

pdf bib
Active Retrieval Augmented Generation
Zhengbao Jiang | Frank Xu | Luyu Gao | Zhiqing Sun | Qian Liu | Jane Dwivedi-Yu | Yiming Yang | Jamie Callan | Graham Neubig
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Despite the remarkable ability of large language models (LMs) to comprehend and generate language, they have a tendency to hallucinate and create factually inaccurate output. Augmenting LMs by retrieving information from external knowledge resources is one promising solution. Most existing retrieval augmented LMs employ a retrieve-and-generate setup that only retrieves information once based on the input. This is limiting, however, in more general scenarios involving generation of long texts, where continually gathering information throughout generation is essential. In this work, we provide a generalized view of active retrieval augmented generation, methods that actively decide when and what to retrieve across the course of the generation. We propose Forward-Looking Active REtrieval augmented generation (FLARE), a generic method which iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens. We test FLARE along with baselines comprehensively over 4 long-form knowledge-intensive generation tasks/datasets. FLARE achieves superior or competitive performance on all tasks, demonstrating the effectiveness of our method.

2022

pdf bib
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Stephen Bach | Victor Sanh | Zheng Xin Yong | Albert Webson | Colin Raffel | Nihal V. Nayak | Abheesht Sharma | Taewoon Kim | M Saiful Bari | Thibault Fevry | Zaid Alyafeai | Manan Dey | Andrea Santilli | Zhiqing Sun | Srulik Ben-david | Canwen Xu | Gunjan Chhablani | Han Wang | Jason Fries | Maged Al-shaibani | Shanya Sharma | Urmish Thakker | Khalid Almubarak | Xiangru Tang | Dragomir Radev | Mike Tian-jian Jiang | Alexander Rush
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaboratively. PromptSource addresses the emergent challenges in this new setting with (1) a templating language for defining data-linked prompts, (2) an interface that lets users quickly iterate on prompt development by observing outputs of their prompts on many examples, and (3) a community-driven set of guidelines for contributing new prompts to a common pool. Over 2,000 prompts for roughly 170 datasets are already available in PromptSource. PromptSource is available at https://github.com/bigscience-workshop/promptsource.

2020

pdf bib
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun | Hongkun Yu | Xiaodan Song | Renjie Liu | Yiming Yang | Denny Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing and accelerating the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied to various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks. To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated BERT_LARGE model. Then, we conduct knowledge transfer from this teacher to MobileBERT. Empirical studies show that MobileBERT is 4.3x smaller and 5.5x faster than BERT_BASE while achieving competitive results on well-known benchmarks. On the natural language inference tasks of GLUE, MobileBERT achieves a GLUE score of 77.7 (0.6 lower than BERT_BASE), and 62 ms latency on a Pixel 4 phone. On the SQuAD v1.1/v2.0 question answering task, MobileBERT achieves a dev F1 score of 90.0/79.2 (1.5/2.1 higher than BERT_BASE).

pdf bib
A Re-evaluation of Knowledge Graph Completion Methods
Zhiqing Sun | Shikhar Vashishth | Soumya Sanyal | Partha Talukdar | Yiming Yang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Knowledge Graph Completion (KGC) aims at automatically predicting missing links for large-scale knowledge graphs. A vast number of state-of-the-art KGC techniques have got published at top conferences in several research fields, including data mining, machine learning, and natural language processing. However, we notice that several recent papers report very high performance, which largely outperforms previous state-of-the-art methods. In this paper, we find that this can be attributed to the inappropriate evaluation protocol used by them and propose a simple evaluation protocol to address this problem. The proposed protocol is robust to handle bias in the model, which can substantially affect the final results. We conduct extensive experiments and report performance of several existing methods using our protocol. The reproducible code has been made publicly available.

2018

pdf bib
Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling
Zhiqing Sun | Zhi-Hong Deng
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Previous traditional approaches to unsupervised Chinese word segmentation (CWS) can be roughly classified into discriminative and generative models. The former uses the carefully designed goodness measures for candidate segmentation, while the latter focuses on finding the optimal segmentation of the highest generative probability. However, while there exists a trivial way to extend the discriminative models into neural version by using neural language models, those of generative ones are non-trivial. In this paper, we propose the segmental language models (SLMs) for CWS. Our approach explicitly focuses on the segmental nature of Chinese, as well as preserves several properties of language models. In SLMs, a context encoder encodes the previous context and a segment decoder generates each segment incrementally. As far as we know, we are the first to propose a neural model for unsupervised CWS and achieve competitive performance to the state-of-the-art statistical models on four different datasets from SIGHAN 2005 bakeoff.