Elo Uncovered: Robustness and Best Practices in Language Model Evaluation Meriem Boubdir author Edward Kim author Beyza Ermis author Sara Hooker author Marzieh Fadaee author 2023-12 text Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM) Sebastian Gehrmann editor Alex Wang editor João Sedoc editor Elizabeth Clark editor Kaustubh Dhole editor Khyathi Raghavi Chandu editor Enrico Santus editor Hooman Sedghamiz editor Association for Computational Linguistics Singapore conference publication boubdir-etal-2023-elo https://aclanthology.org/2023.gem-1.28/ 2023-12 339 352