Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Wang


Abstract
A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we also highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.
Anthology ID:
2022.acl-long.524
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7606–7623
Language:
URL:
https://aclanthology.org/2022.acl-long.524
DOI:
10.18653/v1/2022.acl-long.524
Bibkey:
Cite (ACL):
Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, and Xin Wang. 2022. Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7606–7623, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions (Gu et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.524.pdf
Code
 eric-ai-lab/awesome-vision-language-navigation
Data
ALFREDLaniRxRStreetLearnTEAChTalk the WalkVLN-CE