Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning

Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio


Abstract
We investigate the integration of a planning mechanism into an encoder-decoder architecture with attention. We develop a model that can plan ahead when it computes alignments between the source and target sequences not only for a single time-step but for the next k time-steps as well by constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This mechanism is inspired by strategic attentive reader and writer (STRAW) model, a recent neural architecture for planning with hierarchical reinforcement learning that can also learn higher level temporal abstractions. Our proposed model is end-to-end trainable with differentiable operations. We show that our model outperforms strong baselines on character-level translation task from WMT’15 with fewer parameters and computes alignments that are qualitatively intuitive.
Anthology ID:
W17-2627
Volume:
Proceedings of the 2nd Workshop on Representation Learning for NLP
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Phil Blunsom, Antoine Bordes, Kyunghyun Cho, Shay Cohen, Chris Dyer, Edward Grefenstette, Karl Moritz Hermann, Laura Rimell, Jason Weston, Scott Yih
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
228–234
Language:
URL:
https://aclanthology.org/W17-2627
DOI:
10.18653/v1/W17-2627
Bibkey:
Cite (ACL):
Caglar Gulcehre, Francis Dutil, Adam Trischler, and Yoshua Bengio. 2017. Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 228–234, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning (Gulcehre et al., RepL4NLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2627.pdf