An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results
Enrique Amigo | Julio Gonzalo | Stefano Mizzaro | Jorge Carrillo-de-Albornoz
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
In Ordinal Classification tasks, items have to be assigned to classes that have a relative ordering, such as “positive”, “neutral”, “negative” in sentiment analysis. Remarkably, the most popular evaluation metrics for ordinal classification tasks either ignore relevant information (for instance, precision/recall on each of the classes ignores their relative ordering) or assume additional information (for instance, Mean Average Error assumes absolute distances between classes). In this paper we propose a new metric for Ordinal Classification, Closeness Evaluation Measure, that is rooted on Measurement Theory and Information Theory. Our theoretical analysis and experimental results over both synthetic data and data from NLP shared tasks indicate that the proposed metric captures quality aspects from different traditional tasks simultaneously. In addition, it generalizes some popular classification (nominal scale) and error minimization (interval scale) metrics, depending on the measurement scale in which it is instantiated.