Learning Natural Language Generation with Truncated Reinforcement Learning

Alice Martin; Guillaume Quispe; Charles Ollion; Sylvain Le Corff; Florian Strub; Olivier Pietquin

doi:10.18653/v1/2022.naacl-main.2

Learning Natural Language Generation with Truncated Reinforcement Learning

Alice Martin, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin

Abstract

This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original approach to train conditional languagemodels without a supervised learning phase, by only using reinforcement learning (RL). As RL methods unsuccessfully scale to large action spaces, we dynamically truncate the vocabulary space using a generic language model. TrufLL thus enables to train a language agent by solely interacting with its environment without any task-specific prior knowledge; it is only guided with a task-agnostic language model. Interestingly, this approach avoids the dependency to labelled datasets and inherently reduces pretrained policy flaws such as language or exposure biases. We evaluate TrufLL on two visual question generation tasks, for which we report positive results over performance and language metrics, which we then corroborate with a human evaluation. To our knowledge, it is the first approach that successfully learns a language generation policy without pre-training, using only reinforcement learning.

Anthology ID:: 2022.naacl-main.2
Volume:: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12–37
Language:
URL:: https://aclanthology.org/2022.naacl-main.2/
DOI:: 10.18653/v1/2022.naacl-main.2
Bibkey:
Cite (ACL):: Alice Martin, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, and Olivier Pietquin. 2022. Learning Natural Language Generation with Truncated Reinforcement Learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 12–37, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: Learning Natural Language Generation with Truncated Reinforcement Learning (Martin et al., NAACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.naacl-main.2.pdf
Software:: 2022.naacl-main.2.software.zip
Video:: https://aclanthology.org/2022.naacl-main.2.mp4

PDF Cite Search Software Video Fix data