Exploring Variation of Results from Different Experimental Conditions

Maja Popović, Mohammad Arvan, Natalie Parde, Anya Belz


Abstract
It might reasonably be expected that running multiple experiments for the same task using the same data and model would yield very similar results. Recent research has, however, shown this not to be the case for many NLP experiments. In this paper, we report extensive coordinated work by two NLP groups to run the training and testing pipeline for three neural text simplification models under varying experimental conditions, including different random seeds, run-time environments, and dependency versions, yielding a large number of results for each of the three models using the same data and train/dev/test set splits. From one perspective, these results can be interpreted as shedding light on the reproducibility of evaluation results for the three NTS models, and we present an in-depth analysis of the variation observed for different combinations of experimental conditions. From another perspective, the results raise the question of whether the averaged score should be considered the ‘true’ result for each model.
Anthology ID:
2023.findings-acl.172
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2746–2757
Language:
URL:
https://aclanthology.org/2023.findings-acl.172
DOI:
10.18653/v1/2023.findings-acl.172
Bibkey:
Cite (ACL):
Maja Popović, Mohammad Arvan, Natalie Parde, and Anya Belz. 2023. Exploring Variation of Results from Different Experimental Conditions. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2746–2757, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Exploring Variation of Results from Different Experimental Conditions (Popović et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.172.pdf