A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we also highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.
Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network for sequence modeling, that utilizes an external dynamic memory to encode and retrieve past information. Our model achieves linear time complexity and constant memory space complexity when processing long sequences. We also propose a new optimization scheme, memory replay back-propagation (MRBP), which promotes long-range back-propagation through time with a significantly reduced memory requirement. Experimental results show that Memformer has achieved comparable performance compared against the baselines by using 8.1x less memory space and 3.2x faster on inference. Analysis of the attention pattern shows that our external memory slots can encode and retain important information through timesteps.
Despite excellent performance on tasks such as question answering, Transformer-based architectures remain sensitive to syntactic and contextual ambiguities. Question Paraphrasing (QP) offers a promising solution as a means to augment existing datasets. The main challenges of current QP models include lack of training data and difficulty in generating diverse and natural questions. In this paper, we present Conquest, a framework for generating synthetic datasets for contextual question paraphrasing. To this end, Conquest first employs an answer-aware question generation (QG) model to create a question-pair dataset and then uses this data to train a contextualized question paraphrasing model. We extensively evaluate Conquest and show its ability to produce more diverse and fluent question pairs than existing approaches. Our contextual paraphrase model also establishes a strong baseline for end-to-end contextual paraphrasing. Further, We find that context can improve BLEU-1 score on contextual compression and expansion by 4.3 and 11.2 respectively, compared to a non-contextual model.
Conversational systems enable numerous valuable applications, and question-answering is an important component underlying many of these. However, conversational question-answering remains challenging due to the lack of realistic, domain-specific training data. Inspired by this bottleneck, we focus on conversational question generation as a means to generate synthetic conversations for training and evaluation purposes. We present a number of novel strategies to improve conversational flow and accommodate varying question types and overall fluidity. Specifically, we design ChainCQG as a two-stage architecture that learns question-answer representations across multiple dialogue turns using a flow propagation training strategy. ChainCQG significantly outperforms both answer-aware and answer-unaware SOTA baselines (e.g., up to 48% BLEU-1 improvement). Additionally, our model is able to generate different types of questions, with improved fluidity and coreference alignment.
Large pre-trained language generation models such as GPT-2 have demonstrated their effectiveness as language priors by reaching state-of-the-art results in various language generation tasks. However, the performance of pre-trained models on task-oriented dialog tasks is still under-explored. We propose a Pre-trainedRole Alternating Language model (PRAL), explicitly designed for task-oriented conversational systems. We design several techniques: start position randomization, knowledge distillation, and history discount to improve pre-training performance. In addition, we introduce a high-quality large-scale task-oriented dialog pre-training dataset by post-prossessing13 dialog datasets. We effectively adapt PRALon three downstream tasks. The results show that PRAL outperforms or is on par with state-of-the-art models.
There is a huge performance gap between formal and informal language understanding tasks. The recent pre-trained models that improved formal language understanding tasks did not achieve a comparable result on informal language. We propose data annealing transfer learning procedure to bridge the performance gap on informal natural language understanding tasks. It successfully utilizes a pre-trained model such as BERT in informal language. In the data annealing procedure, the training set contains mainly formal text data at first; then, the proportion of the informal text data is gradually increased during the training process. Our data annealing procedure is model-independent and can be applied to various tasks. We validate its effectiveness in exhaustive experiments. When BERT is implemented with our learning procedure, it outperforms all the state-of-the-art models on the three common informal language tasks.