Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models Reda Yacouby author Dustin Axman author 2020-11 text Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems Steffen Eger editor Yang Gao editor Maxime Peyrard editor Wei Zhao editor Eduard Hovy editor Association for Computational Linguistics Online conference publication yacouby-axman-2020-probabilistic 10.18653/v1/2020.eval4nlp-1.9 https://aclanthology.org/2020.eval4nlp-1.9/ 2020-11 79 91