Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation

Mete Sertkan; Sophia Althammer; Sebastian Hofstätter

doi:10.18653/v1/2023.acl-demo.56

Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation

Mete Sertkan, Sophia Althammer, Sebastian Hofstätter

Abstract

In this paper, we introduce Ranger - a toolkit to facilitate the easy use of effect-size-based meta-analysis for multi-task evaluation in NLP and IR. We observed that our communities often face the challenge of aggregating results over incomparable metrics and scenarios, which makes conclusions and take-away messages less reliable. With Ranger, we aim to address this issue by providing a task-agnostic toolkit that combines the effect of a treatment on multiple tasks into one statistical evaluation, allowing for comparison of metrics and computation of an overall summary effect. Our toolkit produces publication-ready forest plots that enable clear communication of evaluation results over multiple tasks. Our goal with the ready-to-use Ranger toolkit is to promote robust, effect-size-based evaluation and improve evaluation standards in the community. We provide two case studies for common IR and NLP settings to highlight Ranger’s benefits.

Anthology ID:: 2023.acl-demo.56
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Danushka Bollegala, Ruihong Huang, Alan Ritter
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 581–587
Language:
URL:: https://aclanthology.org/2023.acl-demo.56/
DOI:: 10.18653/v1/2023.acl-demo.56
Bibkey:
Cite (ACL):: Mete Sertkan, Sophia Althammer, and Sebastian Hofstätter. 2023. Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 581–587, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation (Sertkan et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-demo.56.pdf
Video:: https://aclanthology.org/2023.acl-demo.56.mp4

PDF Cite Search Video Fix data