Neural Paraphrase Identification of Questions with Noisy Pretraining

Gaurav Singh Tomar, Thyago Duque, Oscar Täckström, Jakob Uszkoreit, Dipanjan Das


Abstract
We present a solution to the problem of paraphrase identification of questions. We focus on a recent dataset of question pairs annotated with binary paraphrase labels and show that a variant of the decomposable attention model (replacing the word embeddings of the decomposable attention model of Parikh et al. 2016 with character n-gram representations) results in accurate performance on this task, while being far simpler than many competing neural architectures. Furthermore, when the model is pretrained on a noisy dataset of automatically collected question paraphrases, it obtains the best reported performance on the dataset.
Anthology ID:
W17-4121
Volume:
Proceedings of the First Workshop on Subword and Character Level Models in NLP
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venues:
SCLeM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
142–147
Language:
URL:
https://aclanthology.org/W17-4121
DOI:
10.18653/v1/W17-4121
Bibkey:
Cite (ACL):
Gaurav Singh Tomar, Thyago Duque, Oscar Täckström, Jakob Uszkoreit, and Dipanjan Das. 2017. Neural Paraphrase Identification of Questions with Noisy Pretraining. In Proceedings of the First Workshop on Subword and Character Level Models in NLP, pages 142–147, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Neural Paraphrase Identification of Questions with Noisy Pretraining (Tomar et al., 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-4121.pdf
Data
GLUEParalexQuora Question Pairs