Vote’n’Rank: Revision of Benchmarking with Social Choice Theory

Mark Rofin, Vladislav Mikhailov, Mikhail Florinsky, Andrey Kravchenko, Tatiana Shavrina, Elena Tutubalina, Daniel Karabekyan, Ekaterina Artemova


Abstract
The development of state-of-the-art systems in different applied areas of machine learning (ML) is driven by benchmarks, which have shaped the paradigm of evaluating generalisation capabilities from multiple perspectives. Although the paradigm is shifting towards more fine-grained evaluation across diverse tasks, the delicate question of how to aggregate the performances has received particular interest in the community. In general, benchmarks follow the unspoken utilitarian principles, where the systems are ranked based on their mean average score over task-specific metrics. Such aggregation procedure has been viewed as a sub-optimal evaluation protocol, which may have created the illusion of progress. This paper proposes Vote’n’Rank, a framework for ranking systems in multi-task benchmarks under the principles of the social choice theory. We demonstrate that our approach can be efficiently utilised to draw new insights on benchmarking in several ML sub-fields and identify the best-performing systems in research and development case studies. The Vote’n’Rank’s procedures are more robust than the mean average while being able to handle missing performance scores and determine conditions under which the system becomes the winner.
Anthology ID:
2023.eacl-main.48
Original:
2023.eacl-main.48v1
Version 2:
2023.eacl-main.48v2
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
670–686
Language:
URL:
https://aclanthology.org/2023.eacl-main.48
DOI:
10.18653/v1/2023.eacl-main.48
Bibkey:
Cite (ACL):
Mark Rofin, Vladislav Mikhailov, Mikhail Florinsky, Andrey Kravchenko, Tatiana Shavrina, Elena Tutubalina, Daniel Karabekyan, and Ekaterina Artemova. 2023. Vote’n’Rank: Revision of Benchmarking with Social Choice Theory. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 670–686, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Vote’n’Rank: Revision of Benchmarking with Social Choice Theory (Rofin et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.48.pdf
Video:
 https://aclanthology.org/2023.eacl-main.48.mp4