Hard Non-Monotonic Attention for Character-Level Transduction

Shijie Wu, Pamela Shapiro, Ryan Cotterell


Abstract
Character-level string-to-string transduction is an important component of various NLP tasks. The goal is to map an input string to an output string, where the strings may be of different lengths and have characters taken from different alphabets. Recent approaches have used sequence-to-sequence models with an attention mechanism to learn which parts of the input string the model should focus on during the generation of the output string. Both soft attention and hard monotonic attention have been used, but hard non-monotonic attention has only been used in other sequence modeling tasks and has required a stochastic approximation to compute the gradient. In this work, we introduce an exact, polynomial-time algorithm for marginalizing over the exponential number of non-monotonic alignments between two strings, showing that hard attention models can be viewed as neural reparameterizations of the classical IBM Model 1. We compare soft and hard non-monotonic attention experimentally and find that the exact algorithm significantly improves performance over the stochastic approximation and outperforms soft attention.
Anthology ID:
D18-1473
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4425–4438
Language:
URL:
https://aclanthology.org/D18-1473
DOI:
10.18653/v1/D18-1473
Bibkey:
Cite (ACL):
Shijie Wu, Pamela Shapiro, and Ryan Cotterell. 2018. Hard Non-Monotonic Attention for Character-Level Transduction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4425–4438, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Hard Non-Monotonic Attention for Character-Level Transduction (Wu et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1473.pdf
Code
 shijie-wu/neural-transducer +  additional community code