Jie Fu


2021

pdf bib
GLGE: A New General Language Generation Evaluation Benchmark
Dayiheng Liu | Yu Yan | Yeyun Gong | Weizhen Qi | Hang Zhang | Jian Jiao | Weizhu Chen | Jie Fu | Linjun Shou | Ming Gong | Pengcheng Wang | Jiusheng Chen | Daxin Jiang | Jiancheng Lv | Ruofei Zhang | Winnie Wu | Ming Zhou | Nan Duan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
On Orthogonality Constraints for Transformers
Aston Zhang | Alvin Chan | Yi Tay | Jie Fu | Shuohang Wang | Shuai Zhang | Huajie Shao | Shuochao Yao | Roy Ka-Wei Lee
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Orthogonality constraints encourage matrices to be orthogonal for numerical stability. These plug-and-play constraints, which can be conveniently incorporated into model training, have been studied for popular architectures in natural language processing, such as convolutional neural networks and recurrent neural networks. However, a dedicated study on such constraints for transformers has been absent. To fill this gap, this paper studies orthogonality constraints for transformers, showing the effectiveness with empirical evidence from ten machine translation tasks and two dialogue generation tasks. For example, on the large-scale WMT’16 En→De benchmark, simply plugging-and-playing orthogonality constraints on the original transformer model (Vaswani et al., 2017) increases the BLEU from 28.4 to 29.6, coming close to the 29.7 BLEU achieved by the very competitive dynamic convolution (Wu et al., 2019).

2020

pdf bib
Interactive Machine Comprehension with Information Seeking Agents
Xingdi Yuan | Jie Fu | Marc-Alexandre Côté | Yi Tay | Chris Pal | Adam Trischler
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Existing machine reading comprehension (MRC) models do not scale effectively to real-world applications like web-level information retrieval and question answering (QA). We argue that this stems from the nature of MRC datasets: most of these are static environments wherein the supporting documents and all necessary information are fully observed. In this paper, we propose a simple method that reframes existing MRC datasets as interactive, partially observable environments. Specifically, we “occlude” the majority of a document’s text and add context-sensitive commands that reveal “glimpses” of the hidden text to a model. We repurpose SQuAD and NewsQA as an initial case study, and then show how the interactive corpora can be used to train a model that seeks relevant information through sequential decision making. We believe that this setting can contribute in scaling models to web-level QA scenarios.

pdf bib
Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences
Yi Tay | Donovan Ong | Jie Fu | Alvin Chan | Nancy Chen | Anh Tuan Luu | Chris Pal
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Understanding human preferences, along with cultural and social nuances, lives at the heart of natural language understanding. Concretely, we present a new task and corpus for learning alignments between machine and human preferences. Our newly introduced problem is concerned with predicting the preferable options from two sentences describing scenarios that may involve social and cultural situations. Our problem is framed as a natural language inference task with crowd-sourced preference votes by human players, obtained from a gamified voting platform. We benchmark several state-of-the-art neural models, along with BERT and friends on this task. Our experimental results show that current state-of-the-art NLP models still leave much room for improvement.

pdf bib
RikiNet: Reading Wikipedia Pages for Natural Question Answering
Dayiheng Liu | Yeyun Gong | Jie Fu | Yu Yan | Jiusheng Chen | Daxin Jiang | Jiancheng Lv | Nan Duan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Reading long documents to answer open-domain questions remains challenging in natural language understanding. In this paper, we introduce a new model, called RikiNet, which reads Wikipedia pages for natural question answering. RikiNet contains a dynamic paragraph dual-attention reader and a multi-level cascaded answer predictor. The reader dynamically represents the document and question by utilizing a set of complementary attention mechanisms. The representations are then fed into the predictor to obtain the span of the short answer, the paragraph of the long answer, and the answer type in a cascaded manner. On the Natural Questions (NQ) dataset, a single RikiNet achieves 74.3 F1 and 57.9 F1 on long-answer and short-answer tasks. To our best knowledge, it is the first single model that outperforms the single human performance. Furthermore, an ensemble RikiNet obtains 76.1 F1 and 61.3 F1 on long-answer and short-answer tasks, achieving the best performance on the official NQ leaderboard.

pdf bib
Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space
Dayiheng Liu | Yeyun Gong | Jie Fu | Yu Yan | Jiusheng Chen | Jiancheng Lv | Nan Duan | Ming Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks. We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples. CRQDA utilizes a Transformer Autoencoder to map the original discrete question into a continuous embedding space. It then uses a pre-trained MRC model to revise the question representation iteratively with gradient-based optimization. Finally, the revised question representations are mapped back into the discrete space, which serve as additional question data. Comprehensive experiments on SQuAD 2.0, SQuAD 1.1 question generation, and QNLI tasks demonstrate the effectiveness of CRQDA.

pdf bib
Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News Multi-Headline Generation
Dayiheng Liu | Yeyun Gong | Yu Yan | Jie Fu | Bo Shao | Daxin Jiang | Jiancheng Lv | Nan Duan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

News headline generation aims to produce a short sentence to attract readers to read the news. One news article often contains multiple keyphrases that are of interest to different users, which can naturally have multiple reasonable headlines. However, most existing methods focus on the single headline generation. In this paper, we propose generating multiple headlines with keyphrases of user interests, whose main idea is to generate multiple keyphrases of interest to users for the news first, and then generate multiple keyphrase-relevant headlines. We propose a multi-source Transformer decoder, which takes three sources as inputs: (a) keyphrase, (b) keyphrase-filtered article, and (c) original article to generate keyphrase-relevant, high-quality, and diverse headlines. Furthermore, we propose a simple and effective method to mine the keyphrases of interest in the news article and build a first large-scale keyphrase-aware news headline corpus, which contains over 180K aligned triples of <news article, headline, keyphrase>. Extensive experimental comparisons on the real-world dataset show that the proposed method achieves state-of-the-art results in terms of quality and diversity.

2019

pdf bib
Interactive Language Learning by Question Answering
Xingdi Yuan | Marc-Alexandre Côté | Jie Fu | Zhouhan Lin | Chris Pal | Yoshua Bengio | Adam Trischler
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Humans observe and interact with the world to acquire knowledge. However, most existing machine reading comprehension (MRC) tasks miss the interactive, information-seeking component of comprehension. Such tasks present models with static documents that contain all necessary information, usually concentrated in a single short substring. Thus, models can achieve strong performance through simple word- and phrase-based pattern matching. We address this problem by formulating a novel text-based question answering task: Question Answering with Interactive Text (QAit). In QAit, an agent must interact with a partially observable text-based environment to gather information required to answer questions. QAit poses questions about the existence, location, and attributes of objects found in the environment. The data is built using a text-based game generator that defines the underlying dynamics of interaction with the environment. We propose and evaluate a set of baseline models for the QAit task that includes deep reinforcement learning agents. Experiments show that the task presents a major challenge for machine reading systems, while humans solve it with relative ease.

pdf bib
Structure Learning for Neural Module Networks
Vardaan Pahuja | Jie Fu | Sarath Chandar | Christopher Pal
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

Neural Module Networks, originally proposed for the task of visual question answering, are a class of neural network architectures that involve human-specified neural modules, each designed for a specific form of reasoning. In current formulations of such networks only the parameters of the neural modules and/or the order of their execution is learned. In this work, we further expand this approach and also learn the underlying internal structure of modules in terms of the ordering and combination of simple and elementary arithmetic operators. We utilize a minimum amount of prior knowledge from the human-specified neural modules in the form of different input types and arithmetic operators used in these modules. Our results show that one is indeed able to simultaneously learn both internal module structure and module sequencing without extra supervisory signals for module execution sequencing. With this approach, we report performance comparable to models using hand-designed modules. In addition, we do a analysis of sensitivity of the learned modules w.r.t. the arithmetic operations and infer the analytical expressions of the learned modules.

pdf bib
Graph Neural Networks with Generated Parameters for Relation Extraction
Hao Zhu | Yankai Lin | Zhiyuan Liu | Jie Fu | Tat-Seng Chua | Maosong Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we propose a novel graph neural network with generated parameters (GP-GNNs). The parameters in the propagation module, i.e. the transition matrices used in message passing procedure, are produced by a generator taking natural language sentences as inputs. We verify GP-GNNs in relation extraction from text, both on bag- and instance-settings. Experimental results on a human-annotated dataset and two distantly supervised datasets show that multi-hop reasoning mechanism yields significant improvements. We also perform a qualitative analysis to demonstrate that our model could discover more accurate relations by multi-hop relational reasoning.

pdf bib
Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks
Yi Tay | Aston Zhang | Anh Tuan Luu | Jinfeng Rao | Shuai Zhang | Shuohang Wang | Jie Fu | Siu Cheung Hui
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Many state-of-the-art neural models for NLP are heavily parameterized and thus memory inefficient. This paper proposes a series of lightweight and memory efficient neural architectures for a potpourri of natural language processing (NLP) tasks. To this end, our models exploit computation using Quaternion algebra and hypercomplex spaces, enabling not only expressive inter-component interactions but also significantly (75%) reduced parameter size due to lesser degrees of freedom in the Hamilton product. We propose Quaternion variants of models, giving rise to new architectures such as the Quaternion attention Model and Quaternion Transformer. Extensive experiments on a battery of NLP tasks demonstrates the utility of proposed Quaternion-inspired models, enabling up to 75% reduction in parameter size without significant loss in performance.

pdf bib
TIGS: An Inference Algorithm for Text Infilling with Gradient Search
Dayiheng Liu | Jie Fu | Pengfei Liu | Jiancheng Lv
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Text infilling aims at filling in the missing part of a sentence or paragraph, which has been applied to a variety of real-world natural language generation scenarios. Given a well-trained sequential generative model, it is challenging for its unidirectional decoder to generate missing symbols conditioned on the past and future information around the missing part. In this paper, we propose an iterative inference algorithm based on gradient search, which could be the first inference algorithm that can be broadly applied to any neural sequence generative models for text infilling tasks. Extensive experimental comparisons show the effectiveness and efficiency of the proposed method on three different text infilling tasks with various mask ratios and different mask strategies, comparing with five state-of-the-art methods.

pdf bib
Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives
Yi Tay | Shuohang Wang | Anh Tuan Luu | Jie Fu | Minh C. Phan | Xingdi Yuan | Jinfeng Rao | Siu Cheung Hui | Aston Zhang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper tackles the problem of reading comprehension over long narratives where documents easily span over thousands of tokens. We propose a curriculum learning (CL) based Pointer-Generator framework for reading/sampling over large documents, enabling diverse training of the neural model based on the notion of alternating contextual difficulty. This can be interpreted as a form of domain randomization and/or generative pretraining during training. To this end, the usage of the Pointer-Generator softens the requirement of having the answer within the context, enabling us to construct diverse training samples for learning. Additionally, we propose a new Introspective Alignment Layer (IAL), which reasons over decomposed alignments using block-based self-attention. We evaluate our proposed method on the NarrativeQA reading comprehension benchmark, achieving state-of-the-art performance, improving existing baselines by 51% relative improvement on BLEU-4 and 17% relative improvement on Rouge-L. Extensive ablations confirm the effectiveness of our proposed IAL and CL components.