Text2Afford: Probing Object Affordance Prediction abilities of Language Models solely from Text

Sayantan Adak; Daivik Agrawal; Animesh Mukherjee; Somak Aditya

Text2Afford: Probing Object Affordance Prediction abilities of Language Models solely from Text

Sayantan Adak, Daivik Agrawal, Animesh Mukherjee, Somak Aditya

Abstract

We investigate the knowledge of object affordances in pre-trained language models (LMs) and pre-trained Vision-Language models (VLMs).A growing body of literature shows that PTLMs fail inconsistently and non-intuitively, demonstrating a lack of reasoning and grounding. To take a first step toward quantifying the effect of grounding (or lack thereof), we curate a novel and comprehensive dataset of object affordances – Text2Afford, characterized by 15 affordance classes. Unlike affordance datasets collected in vision and language domains, we annotate in-the-wild sentences with objects and affordances. Experimental results reveal that PTLMs exhibit limited reasoning abilities when it comes to uncommon object affordances. We also observe that pre-trained VLMs do not necessarily capture object affordances effectively. Through few-shot fine-tuning, we demonstrate improvement in affordance knowledge in PTLMs and VLMs. Our research contributes a novel dataset for language grounding tasks, and presents insights into LM capabilities, advancing the understanding of object affordances.

Anthology ID:: 2024.conll-1.27
Volume:: Proceedings of the 28th Conference on Computational Natural Language Learning
Month:: November
Year:: 2024
Address:: Miami, FL, USA
Editors:: Libby Barak, Malihe Alikhani
Venue:: CoNLL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 342–364
Language:
URL:: https://aclanthology.org/2024.conll-1.27
DOI:
Bibkey:
Cite (ACL):: Sayantan Adak, Daivik Agrawal, Animesh Mukherjee, and Somak Aditya. 2024. Text2Afford: Probing Object Affordance Prediction abilities of Language Models solely from Text. In Proceedings of the 28th Conference on Computational Natural Language Learning, pages 342–364, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):: Text2Afford: Probing Object Affordance Prediction abilities of Language Models solely from Text (Adak et al., CoNLL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.conll-1.27.pdf

PDF Cite Search