Goal-oriented Vision-and-Dialog Navigation via Reinforcement Learning

Yan Cao; Keting Lu; David DeFazio; Shiqi Zhang

doi:10.18653/v1/2022.findings-emnlp.327

Goal-oriented Vision-and-Dialog Navigation via Reinforcement Learning

Yan Cao, Keting Lu, David DeFazio, Shiqi Zhang

Abstract

Vision-and-dialog navigation is a recent benchmark for evaluating the AI capabilities of perception, interaction, and decision making. While existing methods developed for this benchmark have demonstrated great successes, they mostly rely on large datasets, where data collection can be a challenge, and the learned policies are not adaptive to domain changes. In this paper, we focus on a new problem, referred to as goal-oriented vision-and-dialog navigation (GVDN), where an agent uses reinforcement learning techniques to compute dialog-navigation policies from trial and error. A robot conducts visual navigation to locate target objects, and can talk to a remote human operator as needed. Our remote human is able to provide guidance on navigation only if the robot correctly conveys its location through dialog. Experiments have been conducted using photo-realistic simulation environments. Results suggest that, our agent outperforms competitive baselines in success rate.

Anthology ID:: 2022.findings-emnlp.327
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4473–4482
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.327
DOI:: 10.18653/v1/2022.findings-emnlp.327
Bibkey:
Cite (ACL):: Yan Cao, Keting Lu, David DeFazio, and Shiqi Zhang. 2022. Goal-oriented Vision-and-Dialog Navigation via Reinforcement Learning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4473–4482, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Goal-oriented Vision-and-Dialog Navigation via Reinforcement Learning (Cao et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-emnlp.327.pdf
Video:: https://aclanthology.org/2022.findings-emnlp.327.mp4

PDF Cite Search Video