Adversarial Domain Adaptation for Duplicate Question Detection

Darsh Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, Preslav Nakov


Abstract
We address the problem of detecting duplicate questions in forums, which is an important step towards automating the process of answering new questions. As finding and annotating such potential duplicates manually is very tedious and costly, automatic methods based on machine learning are a viable alternative. However, many forums do not have annotated data, i.e., questions labeled by experts as duplicates, and thus a promising solution is to use domain adaptation from another forum that has such annotations. Here we focus on adversarial domain adaptation, deriving important findings about when it performs well and what properties of the domains are important in this regard. Our experiments with StackExchange data show an average improvement of 5.6% over the best baseline across multiple pairs of domains.
Anthology ID:
D18-1131
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1056–1063
Language:
URL:
https://aclanthology.org/D18-1131
DOI:
10.18653/v1/D18-1131
Bibkey:
Cite (ACL):
Darsh Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov. 2018. Adversarial Domain Adaptation for Duplicate Question Detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1056–1063, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Adversarial Domain Adaptation for Duplicate Question Detection (Shah et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1131.pdf