GesNavi: Gesture-guided Outdoor Vision-and-Language Navigation

Aman Jain, Teruhisa Misu, Kentaro Yamada, Hitomi Yanaka


Abstract
Vision-and-Language Navigation (VLN) task involves navigating mobility using linguistic commands and has application in developing interfaces for autonomous mobility. In reality, natural human communication also encompasses non-verbal cues like hand gestures and gaze. These gesture-guided instructions have been explored in Human-Robot Interaction systems for effective interaction, particularly in object-referring expressions. However, a notable gap exists in tackling gesture-based demonstrative expressions in outdoor VLN task. To address this, we introduce a novel dataset for gesture-guided outdoor VLN instructions with demonstrative expressions, designed with a focus on complex instructions requiring multi-hop reasoning between the multiple input modalities. In addition, our work also includes a comprehensive analysis of the collected data and a comparative evaluation against the existing datasets.
Anthology ID:
2024.eacl-srw.23
Volume:
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Neele Falk, Sara Papi, Mike Zhang
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
290–295
Language:
URL:
https://aclanthology.org/2024.eacl-srw.23
DOI:
Bibkey:
Cite (ACL):
Aman Jain, Teruhisa Misu, Kentaro Yamada, and Hitomi Yanaka. 2024. GesNavi: Gesture-guided Outdoor Vision-and-Language Navigation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 290–295, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
GesNavi: Gesture-guided Outdoor Vision-and-Language Navigation (Jain et al., EACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eacl-srw.23.pdf