ConQuest: Contextual Question Paraphrasing through Answer-Aware Synthetic Question Generation
Mostafa Mirshekari | Jing Gu | Aaron Sisto
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Despite excellent performance on tasks such as question answering, Transformer-based architectures remain sensitive to syntactic and contextual ambiguities. Question Paraphrasing (QP) offers a promising solution as a means to augment existing datasets. The main challenges of current QP models include lack of training data and difficulty in generating diverse and natural questions. In this paper, we present Conquest, a framework for generating synthetic datasets for contextual question paraphrasing. To this end, Conquest first employs an answer-aware question generation (QG) model to create a question-pair dataset and then uses this data to train a contextualized question paraphrasing model. We extensively evaluate Conquest and show its ability to produce more diverse and fluent question pairs than existing approaches. Our contextual paraphrase model also establishes a strong baseline for end-to-end contextual paraphrasing. Further, We find that context can improve BLEU-1 score on contextual compression and expansion by 4.3 and 11.2 respectively, compared to a non-contextual model.
ChainCQG: Flow-Aware Conversational Question Generation
Jing Gu | Mostafa Mirshekari | Zhou Yu | Aaron Sisto
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Conversational systems enable numerous valuable applications, and question-answering is an important component underlying many of these. However, conversational question-answering remains challenging due to the lack of realistic, domain-specific training data. Inspired by this bottleneck, we focus on conversational question generation as a means to generate synthetic conversations for training and evaluation purposes. We present a number of novel strategies to improve conversational flow and accommodate varying question types and overall fluidity. Specifically, we design ChainCQG as a two-stage architecture that learns question-answer representations across multiple dialogue turns using a flow propagation training strategy. ChainCQG significantly outperforms both answer-aware and answer-unaware SOTA baselines (e.g., up to 48% BLEU-1 improvement). Additionally, our model is able to generate different types of questions, with improved fluidity and coreference alignment.