FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation

Zi-Yi Dou, Nanyun Peng


Abstract
The speaker-follower models have proven to be effective in vision-and-language navigation, where a speaker model is used to synthesize new instructions to augment the training data for a follower navigation model. However, in previous work, the speaker model is follower-agnostic and fails to take the state of the follower into consideration. In this paper, we present FOAM, a FOllower-Aware speaker Model that is constantly updated given the follower feedback, so that the generated instructions can be more suitable to the current learning state of the follower. Specifically, we optimize the speaker using a bi-level optimization framework and obtain its training signals by evaluating the follower on labeled data. Experimental results on the Room-to-Room and Room-across-Room datasets demonstrate that our methods can outperform strong baseline models across settings. Analyses also reveal that our generated instructions are of higher quality than the baselines.
Anthology ID:
2022.naacl-main.322
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4332–4340
Language:
URL:
https://aclanthology.org/2022.naacl-main.322
DOI:
10.18653/v1/2022.naacl-main.322
Bibkey:
Cite (ACL):
Zi-Yi Dou and Nanyun Peng. 2022. FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4332–4340, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation (Dou & Peng, NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.322.pdf
Code
 pluslabnlp/follower_aware_speaker
Data
RxR