A strong baseline for question relevancy ranking

Ana Gonzalez, Isabelle Augenstein, Anders Søgaard


Abstract
The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks – a task that amounts to question relevancy ranking – involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google’s search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.
Anthology ID:
D18-1515
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4810–4815
Language:
URL:
https://aclanthology.org/D18-1515/
DOI:
10.18653/v1/D18-1515
Bibkey:
Cite (ACL):
Ana Gonzalez, Isabelle Augenstein, and Anders Søgaard. 2018. A strong baseline for question relevancy ranking. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4810–4815, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
A strong baseline for question relevancy ranking (Gonzalez et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1515.pdf
Video:
 https://aclanthology.org/D18-1515.mp4