LaMemo: Language Modeling with Look-Ahead Memory

Haozhe Ji, Rongsheng Zhang, Zhenyu Yang, Zhipeng Hu, Minlie Huang


Abstract
Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current context that provides up-to-date information for token prediction. To remedy this issue, we propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens and interpolating with the old memory states to maintain long-term information in the history. LaMemo embraces bi-directional attention and segment recurrence with an additional computation overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory mechanisms.
Anthology ID:
2022.naacl-main.422
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5747–5762
Language:
URL:
https://aclanthology.org/2022.naacl-main.422
DOI:
10.18653/v1/2022.naacl-main.422
Bibkey:
Cite (ACL):
Haozhe Ji, Rongsheng Zhang, Zhenyu Yang, Zhipeng Hu, and Minlie Huang. 2022. LaMemo: Language Modeling with Look-Ahead Memory. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5747–5762, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
LaMemo: Language Modeling with Look-Ahead Memory (Ji et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.422.pdf
Video:
 https://aclanthology.org/2022.naacl-main.422.mp4
Code
 thu-coai/lamemo
Data
WikiText-103WikiText-2