VLN-Trans: Translator for the Vision and Language Navigation Agent

Yue Zhang; Parisa Kordjamshidi

doi:10.18653/v1/2023.acl-long.737

VLN-Trans: Translator for the Vision and Language Navigation Agent

Abstract

Language understanding is essential for the navigation agent to follow instructions. We observe two kinds of issues in the instructions that can make the navigation task challenging: 1. The mentioned landmarks are not recognizable by the navigation agent due to the different vision abilities of the instructor and the modeled agent. 2. The mentioned landmarks are applicable to multiple targets, thus not distinctive for selecting the target among the candidate viewpoints. To deal with these issues, we design a translator module for the navigation agent to convert the original instructions into easy-to-follow sub-instruction representations at each step. The translator needs to focus on the recognizable and distinctive landmarks based on the agent’s visual abilities and the observed visual environment. To achieve this goal, we create a new synthetic sub-instruction dataset and design specific tasks to train the translator and the navigation agent. We evaluate our approach on Room2Room (R2R), Room4room (R4R), and Room2Room Last (R2R-Last) datasets and achieve state-of-the-art results on multiple benchmarks.

Anthology ID:: 2023.acl-long.737
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13219–13233
Language:
URL:: https://aclanthology.org/2023.acl-long.737/
DOI:: 10.18653/v1/2023.acl-long.737
Bibkey:
Cite (ACL):: Yue Zhang and Parisa Kordjamshidi. 2023. VLN-Trans: Translator for the Vision and Language Navigation Agent. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13219–13233, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: VLN-Trans: Translator for the Vision and Language Navigation Agent (Zhang & Kordjamshidi, ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-long.737.pdf
Video:: https://aclanthology.org/2023.acl-long.737.mp4

PDF Cite Search Video Fix data