Large Scale Question Paraphrase Retrieval with Smoothed Deep Metric Learning

Daniele Bonadiman; Anjishnu Kumar; Arpit Mittal

doi:10.18653/v1/D19-5509

Large Scale Question Paraphrase Retrieval with Smoothed Deep Metric Learning

Daniele Bonadiman, Anjishnu Kumar, Arpit Mittal

Abstract

The goal of a Question Paraphrase Retrieval (QPR) system is to retrieve equivalent questions that result in the same answer as the original question. Such a system can be used to understand and answer rare and noisy reformulations of common questions by mapping them to a set of canonical forms. This has large-scale applications for community Question Answering (cQA) and open-domain spoken language question answering systems. In this paper we describe a new QPR system implemented as a Neural Information Retrieval (NIR) system consisting of a neural network sentence encoder and an approximate k-Nearest Neighbour index for efficient vector retrieval. We also describe our mechanism to generate an annotated dataset for question paraphrase retrieval experiments automatically from question-answer logs via distant supervision. We show that the standard loss function in NIR, triplet loss, does not perform well with noisy labels. We propose smoothed deep metric loss (SDML) and with our experiments on two QPR datasets we show that it significantly outperforms triplet loss in the noisy label setting.

Anthology ID:: D19-5509
Volume:: Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:: WNUT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 68–75
Language:
URL:: https://aclanthology.org/D19-5509/
DOI:: 10.18653/v1/D19-5509
Bibkey:
Cite (ACL):: Daniele Bonadiman, Anjishnu Kumar, and Arpit Mittal. 2019. Large Scale Question Paraphrase Retrieval with Smoothed Deep Metric Learning. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 68–75, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: Large Scale Question Paraphrase Retrieval with Smoothed Deep Metric Learning (Bonadiman et al., WNUT 2019)
Copy Citation:
PDF:: https://aclanthology.org/D19-5509.pdf

PDF Cite Search Fix data