When social robots see our sketches: evaluating human perception of a robot and a VLM model performance in a drawing task

Viktoria Paraskevi Daniilidou, Nikolai Ilinykh, Vladislav Maraev


Abstract
We introduce a multimodal framework for interactive drawing in a robot-assisted second language learning scenario. In this scenario, humans are asked to draw objects and spatial relations between them, while a social robot that uses a vision-language model (VLM) to analyse whether the drawings are correct.The correctness decision that is passed to the human is coming from a Wizard-of-Oz (WoZ) setup. Therefore, we use it to indirectly evaluate the quality of VLM predictions. We show that the task is very challenging for a VLM and approaching evaluation of VLM performance is important: focusing on the correctness of prediction of certain features (objects, relations) provides a different evaluation picture from when the model is evaluated on prediction of the content of the image as a whole. We also examine how the appearance of the social agent and the type of feedback influence perception of the agent by the participants through a questionnaire. The comparison of verbal feedback, generated by the large language models, against simple pattern-based feedback did not show any significant effects whereas the robot’s appearance change indicated significant difference in user ratings concerning naturalness of the agent and its social presence.
Anthology ID:
2026.iwsds-1.25
Volume:
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
Month:
February
Year:
2026
Address:
Trento, Italy
Editors:
Giuseppe Riccardi, Seyed Mahed Mousavi, Maria Ines Torres, Koichiro Yoshino, Zoraida Callejas, Shammur Absar Chowdhury, Yun-Nung Chen, Frederic Bechet, Joakim Gustafson, Géraldine Damnati, Alex Papangelis, Luis Fernando D’Haro, John Mendonça, Raffaella Bernardi, Dilek Hakkani-Tur, Giuseppe "Pino" Di Fabbrizio, Tatsuya Kawahara, Firoj Alam, Gokhan Tur, Michael Johnston
Venue:
IWSDS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
239–252
Language:
URL:
https://aclanthology.org/2026.iwsds-1.25/
DOI:
Bibkey:
Cite (ACL):
Viktoria Paraskevi Daniilidou, Nikolai Ilinykh, and Vladislav Maraev. 2026. When social robots see our sketches: evaluating human perception of a robot and a VLM model performance in a drawing task. In Proceedings of the 16th International Workshop on Spoken Dialogue System Technology, pages 239–252, Trento, Italy. Association for Computational Linguistics.
Cite (Informal):
When social robots see our sketches: evaluating human perception of a robot and a VLM model performance in a drawing task (Daniilidou et al., IWSDS 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.iwsds-1.25.pdf