Towards building a Robust Industry-scale Question Answering System

Rishav Chakravarti, Anthony Ferritto, Bhavani Iyer, Lin Pan, Radu Florian, Salim Roukos, Avi Sil


Abstract
Industry-scale NLP systems necessitate two features. 1. Robustness: “zero-shot transfer learning” (ZSTL) performance has to be commendable and 2. Efficiency: systems have to train efficiently and respond instantaneously. In this paper, we introduce the development of a production model called GAAMA (Go Ahead Ask Me Anything) which possess the above two characteristics. For robustness, it trains on the recently introduced Natural Questions (NQ) dataset. NQ poses additional challenges over older datasets like SQuAD: (a) QA systems need to read and comprehend an entire Wikipedia article rather than a small passage, and (b) NQ does not suffer from observation bias during construction, resulting in less lexical overlap between the question and the article. GAAMA consists of Attention-over-Attention, diversity among attention heads, hierarchical transfer learning, and synthetic data augmentation while being computationally inexpensive. Building on top of the powerful BERTQA model, GAAMA provides a ∼2.0% absolute boost in F1 over the industry-scale state-of-the-art (SOTA) system on NQ. Further, we show that GAAMA transfers zero-shot to unseen real life and important domains as it yields respectable performance on two benchmarks: the BioASQ and the newly introduced CovidQA datasets.
Anthology ID:
2020.coling-industry.9
Volume:
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track
Month:
December
Year:
2020
Address:
Online
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
90–101
Language:
URL:
https://aclanthology.org/2020.coling-industry.9
DOI:
10.18653/v1/2020.coling-industry.9
Bibkey:
Cite (ACL):
Rishav Chakravarti, Anthony Ferritto, Bhavani Iyer, Lin Pan, Radu Florian, Salim Roukos, and Avi Sil. 2020. Towards building a Robust Industry-scale Question Answering System. In Proceedings of the 28th International Conference on Computational Linguistics: Industry Track, pages 90–101, Online. International Committee on Computational Linguistics.
Cite (Informal):
Towards building a Robust Industry-scale Question Answering System (Chakravarti et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-industry.9.pdf
Data
BioASQCovidQANatural QuestionsSQuAD