PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

Clement Rebuffel, Laure Soulier, Geoffrey Scoutheeten, Patrick Gallinari


Abstract
In language generation models conditioned by structured data, the classical training via maximum likelihood almost always leads models to pick up on dataset divergence (i.e., hallucinations or omissions), and to incorporate them erroneously in their own generations at inference. In this work, we build on top of previous Reinforcement Learning based approaches and show that a model-agnostic framework relying on the recently introduced PARENT metric is efficient at reducing both hallucinations and omissions. Evaluations on the widely used WikiBIO and WebNLG benchmarks demonstrate the effectiveness of this framework compared to state-of-the-art models.
Anthology ID:
2020.inlg-1.18
Volume:
Proceedings of the 13th International Conference on Natural Language Generation
Month:
December
Year:
2020
Address:
Dublin, Ireland
Editors:
Brian Davis, Yvette Graham, John Kelleher, Yaji Sripada
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
120–130
Language:
URL:
https://aclanthology.org/2020.inlg-1.18
DOI:
10.18653/v1/2020.inlg-1.18
Bibkey:
Cite (ACL):
Clement Rebuffel, Laure Soulier, Geoffrey Scoutheeten, and Patrick Gallinari. 2020. PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation. In Proceedings of the 13th International Conference on Natural Language Generation, pages 120–130, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation (Rebuffel et al., INLG 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.inlg-1.18.pdf
Code
 KaijuML/PARENTing-rl
Data
WikiBio