Viktoria Paraskevi Daniilidou
2026
When social robots see our sketches: evaluating human perception of a robot and a VLM model performance in a drawing task
Viktoria Paraskevi Daniilidou | Nikolai Ilinykh | Vladislav Maraev
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
Viktoria Paraskevi Daniilidou | Nikolai Ilinykh | Vladislav Maraev
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
We introduce a multimodal framework for interactive drawing in a robot-assisted second language learning scenario. In this scenario, humans are asked to draw objects and spatial relations between them, while a social robot that uses a vision-language model (VLM) to analyse whether the drawings are correct.The correctness decision that is passed to the human is coming from a Wizard-of-Oz (WoZ) setup. Therefore, we use it to indirectly evaluate the quality of VLM predictions. We show that the task is very challenging for a VLM and approaching evaluation of VLM performance is important: focusing on the correctness of prediction of certain features (objects, relations) provides a different evaluation picture from when the model is evaluated on prediction of the content of the image as a whole. We also examine how the appearance of the social agent and the type of feedback influence perception of the agent by the participants through a questionnaire. The comparison of verbal feedback, generated by the large language models, against simple pattern-based feedback did not show any significant effects whereas the robot’s appearance change indicated significant difference in user ratings concerning naturalness of the agent and its social presence.