The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks Kaiser Sun author Adina Williams author Dieuwke Hupkes author 2023-12 text Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL) Jing Jiang editor David Reitter editor Shumin Deng editor Association for Computational Linguistics Singapore conference publication sun-etal-2023-validity 10.18653/v1/2023.conll-1.19 https://aclanthology.org/2023.conll-1.19/ 2023-12 274 293