Questionable Answers in Question Answering Research: Reproducibility and Variability of Published Results

Matt Crane


Abstract
“Based on theoretical reasoning it has been suggested that the reliability of findings published in the scientific literature decreases with the popularity of a research field” (Pfeiffer and Hoffmann, 2009). As we know, deep learning is very popular and the ability to reproduce results is an important part of science. There is growing concern within the deep learning community about the reproducibility of results that are presented. In this paper we present a number of controllable, yet unreported, effects that can substantially change the effectiveness of a sample model, and thusly the reproducibility of those results. Through these environmental effects we show that the commonly held belief that distribution of source code is all that is needed for reproducibility is not enough. Source code without a reproducible environment does not mean anything at all. In addition the range of results produced from these effects can be larger than the majority of incremental improvement reported.
Anthology ID:
Q18-1018
Volume:
Transactions of the Association for Computational Linguistics, Volume 6
Month:
Year:
2018
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Kristina Toutanova, Brian Roark
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
241–252
Language:
URL:
https://aclanthology.org/Q18-1018
DOI:
10.1162/tacl_a_00018
Bibkey:
Cite (ACL):
Matt Crane. 2018. Questionable Answers in Question Answering Research: Reproducibility and Variability of Published Results. Transactions of the Association for Computational Linguistics, 6:241–252.
Cite (Informal):
Questionable Answers in Question Answering Research: Reproducibility and Variability of Published Results (Crane, TACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/Q18-1018.pdf
Video:
 https://aclanthology.org/Q18-1018.mp4
Data
WikiQA