Reference-based Weak Supervision for Answer Sentence Selection using Web Data

Vivek Krishnamurthy, Thuy Vu, Alessandro Moschitti


Abstract
Answer Sentence Selection (AS2) models are core components of efficient retrieval-based Question Answering (QA) systems. We present the Reference-based Weak Supervision (RWS), a fully automatic large-scale data pipeline that harvests high-quality weakly- supervised answer sentences from Web data, only requiring a question-reference pair as input. We evaluated the quality of the RWS-derived data by training TANDA models, which are the state of the art for AS2. Our results show that the data consistently bolsters TANDA on three different datasets. In particular, we set the new state of the art for AS2 to P@1=90.1%, and MAP=92.9%, on WikiQA. We record similar performance gains of RWS on a much larger dataset named Web-based Question Answering (WQA).
Anthology ID:
2021.findings-emnlp.363
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4294–4299
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.363
DOI:
10.18653/v1/2021.findings-emnlp.363
Bibkey:
Cite (ACL):
Vivek Krishnamurthy, Thuy Vu, and Alessandro Moschitti. 2021. Reference-based Weak Supervision for Answer Sentence Selection using Web Data. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4294–4299, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Reference-based Weak Supervision for Answer Sentence Selection using Web Data (Krishnamurthy et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.363.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.363.mp4
Data
ASNQNatural QuestionsTrecQAWikiQA