Language Models are Few-Shot Butlers

Vincent Micheli, Francois Fleuret


Abstract
Pretrained language models demonstrate strong performance in most NLP tasks when fine-tuned on small task-specific datasets. Hence, these autoregressive models constitute ideal agents to operate in text-based environments where language understanding and generative capabilities are essential. Nonetheless, collecting expert demonstrations in such environments is a time-consuming endeavour. We introduce a two-stage procedure to learn from a small set of demonstrations and further improve by interacting with an environment. We show that language models fine-tuned with only 1.2% of the expert demonstrations and a simple reinforcement learning algorithm achieve a 51% absolute improvement in success rate over existing methods in the ALFWorld environment.
Anthology ID:
2021.emnlp-main.734
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9312–9318
Language:
URL:
https://aclanthology.org/2021.emnlp-main.734
DOI:
10.18653/v1/2021.emnlp-main.734
Bibkey:
Cite (ACL):
Vincent Micheli and Francois Fleuret. 2021. Language Models are Few-Shot Butlers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9312–9318, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Language Models are Few-Shot Butlers (Micheli & Fleuret, EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.734.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.734.mp4
Code
 vmicheli/lm-butlers