Human-Model Divergence in the Handling of Vagueness
Elias Stengel-Eskin | Jimena Guallar-Blasco | Benjamin Van Durme
Proceedings of the 1st Workshop on Understanding Implicit and Underspecified Language
While aggregate performance metrics can generate valuable insights at a large scale, their dominance means more complex and nuanced language phenomena, such as vagueness, may be overlooked. Focusing on vague terms (e.g. sunny, cloudy, young, etc.) we inspect the behavior of visually grounded and text-only models, finding systematic divergences from human judgments even when a model’s overall performance is high. To help explain this disparity, we identify two assumptions made by the datasets and models examined and, guided by the philosophy of vagueness, isolate cases where they do not hold.