Nibbling at the Hard Core of Word Sense Disambiguation

Marco Maru, Simone Conia, Michele Bevilacqua, Roberto Navigli


Abstract
With state-of-the-art systems having finally attained estimated human performance, Word Sense Disambiguation (WSD) has now joined the array of Natural Language Processing tasks that have seemingly been solved, thanks to the vast amounts of knowledge encoded into Transformer-based pre-trained language models. And yet, if we look below the surface of raw figures, it is easy to realize that current approaches still make trivial mistakes that a human would never make. In this work, we provide evidence showing why the F1 score metric should not simply be taken at face value and present an exhaustive analysis of the errors that seven of the most representative state-of-the-art systems for English all-words WSD make on traditional evaluation benchmarks.In addition, we produce and release a collection of test sets featuring (a) an amended version of the standard evaluation benchmark that fixes its lexical and semantic inaccuracies, (b) 42D, a challenge set devised to assess the resilience of systems with respect to least frequent word senses and senses not seen at training time, and (c) hardEN, a challenge set made up solely of instances which none of the investigated state-of-the-art systems can solve. We make all of the test sets and model predictions available to the research community at https://github.com/SapienzaNLP/wsd-hard-benchmark.
Anthology ID:
2022.acl-long.324
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4724–4737
Language:
URL:
https://aclanthology.org/2022.acl-long.324
DOI:
10.18653/v1/2022.acl-long.324
Bibkey:
Cite (ACL):
Marco Maru, Simone Conia, Michele Bevilacqua, and Roberto Navigli. 2022. Nibbling at the Hard Core of Word Sense Disambiguation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4724–4737, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Nibbling at the Hard Core of Word Sense Disambiguation (Maru et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.324.pdf
Code
 sapienzanlp/wsd-hard-benchmark
Data
Word Sense Disambiguation: a Unified Evaluation Framework and Empirical Comparison