In sequence modeling, certain tokens are usually less ambiguous than others, and representations of these tokens require fewer refinements for disambiguation. However, given the nature of attention-based models like Transformer and UT (universal transformer), all tokens are equally processed towards depth. Inspired by the equilibrium phenomenon, we present a lazy transition, a mechanism to adjust the significance of iterative refinements for each token representation. Our lazy transition is deployed on top of UT to build LT (lazy transformer), where all tokens are processed unequally towards depth. Eventually, LT is encouraged to oscillate around a relaxed equilibrium. Our experiments show that LT outperforms baseline models on several tasks of machine translation, pre-training, Learning to Execute, and LAMBADA.
Almost Free Semantic Draft for Neural Machine Translation
Xi Ai | Bin Fang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Translation quality can be improved by global information from the required target sentence because the decoder can understand both past and future information. However, the model needs additional cost to produce and consider such global information. In this work, to inject global information but also save cost, we present an efficient method to sample and consider a semantic draft as global information from semantic space for decoding with almost free of cost. Unlike other successful adaptations, we do not have to perform an EM-like process that repeatedly samples a possible semantic from the semantic space. Empirical experiments show that the presented method can achieve competitive performance in common language pairs with a clear advantage in inference efficiency. We will open all our source code on GitHub.