Team JARS: DialDoc Subtask 1 - Improved Knowledge Identification with Supervised Out-of-Domain Pretraining

Sopan Khosla; Justin Lovelace; Ritam Dutt; Adithya Pratapa

doi:10.18653/v1/2021.dialdoc-1.13

Team JARS: DialDoc Subtask 1 - Improved Knowledge Identification with Supervised Out-of-Domain Pretraining

Sopan Khosla, Justin Lovelace, Ritam Dutt, Adithya Pratapa

Abstract

In this paper, we discuss our submission for DialDoc subtask 1. The subtask requires systems to extract knowledge from FAQ-type documents vital to reply to a user’s query in a conversational setting. We experiment with pretraining a BERT-based question-answering model on different QA datasets from MRQA, as well as conversational QA datasets like CoQA and QuAC. Our results show that models pretrained on CoQA and QuAC perform better than their counterparts that are pretrained on MRQA datasets. Our results also indicate that adding more pretraining data does not necessarily result in improved performance. Our final model, which is an ensemble of AlBERT-XL pretrained on CoQA and QuAC independently, with the chosen answer having the highest average probability score, achieves an F1-Score of 70.9% on the official test-set.

Anthology ID:: 2021.dialdoc-1.13
Volume:: Proceedings of the 1st Workshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc 2021)
Month:: August
Year:: 2021
Address:: Online
Editors:: Song Feng, Siva Reddy, Malihe Alikhani, He He, Yangfeng Ji, Mohit Iyyer, Zhou Yu
Venue:: dialdoc
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 103–108
Language:
URL:: https://aclanthology.org/2021.dialdoc-1.13/
DOI:: 10.18653/v1/2021.dialdoc-1.13
Bibkey:
Cite (ACL):: Sopan Khosla, Justin Lovelace, Ritam Dutt, and Adithya Pratapa. 2021. Team JARS: DialDoc Subtask 1 - Improved Knowledge Identification with Supervised Out-of-Domain Pretraining. In Proceedings of the 1st Workshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc 2021), pages 103–108, Online. Association for Computational Linguistics.
Cite (Informal):: Team JARS: DialDoc Subtask 1 - Improved Knowledge Identification with Supervised Out-of-Domain Pretraining (Khosla et al., dialdoc 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.dialdoc-1.13.pdf
Data: CoQA, Doc2Dial, HotpotQA, MRQA, Natural Questions, NewsQA, QuAC, SQuAD, TriviaQA, doc2dial

PDF Cite Search Fix data