Soft metrics for evaluation with disagreements: an assessment

Giulia Rizzi, Elisa Leonardelli, Massimo Poesio, Alexandra Uma, Maja Pavlovic, Silviu Paun, Paolo Rosso, Elisabetta Fersini


Abstract
The move towards preserving judgement disagreements in NLP requires the identification of adequate evaluation metrics. We identify a set of key properties that such metrics should have, and assess the extent to which natural candidates for soft evaluation such as Cross Entropy satisfy such properties. We employ a theoretical framework, supported by a visual approach, by practical examples, and by the analysis of a real case scenario. Our results indicate that Cross Entropy can result in fairly paradoxical results in some cases, whereas other measures Manhattan distance and Euclidean distance exhibit a more intuitive behavior, at least for the case of binary classification.
Anthology ID:
2024.nlperspectives-1.9
Volume:
Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Gavin Abercrombie, Valerio Basile, Davide Bernadi, Shiran Dudy, Simona Frenda, Lucy Havens, Sara Tonelli
Venues:
NLPerspectives | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
84–94
Language:
URL:
https://aclanthology.org/2024.nlperspectives-1.9
DOI:
Bibkey:
Cite (ACL):
Giulia Rizzi, Elisa Leonardelli, Massimo Poesio, Alexandra Uma, Maja Pavlovic, Silviu Paun, Paolo Rosso, and Elisabetta Fersini. 2024. Soft metrics for evaluation with disagreements: an assessment. In Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024, pages 84–94, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Soft metrics for evaluation with disagreements: an assessment (Rizzi et al., NLPerspectives-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlperspectives-1.9.pdf