NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark Oscar Sainz author Jon Campos author Iker GarcĂ­a-Ferrero author Julen Etxaniz author Oier Lopez de Lacalle author Eneko Agirre author 2023-12 text Findings of the Association for Computational Linguistics: EMNLP 2023 Houda Bouamor editor Juan Pino editor Kalika Bali editor Association for Computational Linguistics Singapore conference publication sainz-etal-2023-nlp 10.18653/v1/2023.findings-emnlp.722 https://aclanthology.org/2023.findings-emnlp.722/ 2023-12 10776 10787