AbstractReplication of research results has become more and more important in Natural Language Processing. Nevertheless, we still rely on results reported in the literature for comparison. Additionally, elements of an experimental setup are not always completely reported. This includes, but is not limited to reporting specific parameters used or omitting an implementational detail. In our experiment based on two frequently used data sets from the domain of automatic summarization and the seemingly full disclosure of research artefacts, we examine how well results reported are replicable and what elements influence the success or failure of replication. Our results indicate that publishing research artifacts is far from sufficient, that that publishing all relevant parameters in all possible detail is cruicial.