Jivko Sinapov
2024
Creative Problem Solving in Large Language and Vision Models - What Would it Take?
Lakshmi Nair
|
Evana Gizzi
|
Jivko Sinapov
Findings of the Association for Computational Linguistics: EMNLP 2024
We advocate for a strong integration of Computational Creativity (CC) with research in large language and vision models (LLVMs) to address a key limitation of these models, i.e., creative problem solving. We present preliminary experiments showing how CC principles can be applied to address this limitation. Our goal is to foster discussions on creative problem solving in LLVMs and CC at prestigious ML venues.
2017
Guiding Interaction Behaviors for Multi-modal Grounded Language Learning
Jesse Thomason
|
Jivko Sinapov
|
Raymond Mooney
Proceedings of the First Workshop on Language Grounding for Robotics
Multi-modal grounded language learning connects language predicates to physical properties of objects in the world. Sensing with multiple modalities, such as audio, haptics, and visual colors and shapes while performing interaction behaviors like lifting, dropping, and looking on objects enables a robot to ground non-visual predicates like “empty” as well as visual predicates like “red”. Previous work has established that grounding in multi-modal space improves performance on object retrieval from human descriptions. In this work, we gather behavior annotations from humans and demonstrate that these improve language grounding performance by allowing a system to focus on relevant behaviors for words like “white” or “half-full” that can be understood by looking or lifting, respectively. We also explore adding modality annotations (whether to focus on audio or haptics when performing a behavior), which improves performance, and sharing information between linguistically related predicates (if “green” is a color, “white” is a color), which improves grounding recall but at the cost of precision.