Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP

Gaurav Arora, Afshin Rahimi, Timothy Baldwin


Abstract
Catastrophic forgetting — whereby a model trained on one task is fine-tuned on a second, and in doing so, suffers a “catastrophic” drop in performance over the first task — is a hurdle in the development of better transfer learning techniques. Despite impressive progress in reducing catastrophic forgetting, we have limited understanding of how different architectures and hyper-parameters affect forgetting in a network. With this study, we aim to understand factors which cause forgetting during sequential training. Our primary finding is that CNNs forget less than LSTMs. We show that max-pooling is the underlying operation which helps CNNs alleviate forgetting compared to LSTMs. We also found that curriculum learning, placing a hard task towards the end of task sequence, reduces forgetting. We analysed the effect of fine-tuning contextual embeddings on catastrophic forgetting and found that using embeddings as feature extractor is preferable to fine-tuning in continual learning setup.
Anthology ID:
U19-1011
Volume:
Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association
Month:
4--6 December
Year:
2019
Address:
Sydney, Australia
Editors:
Meladel Mistica, Massimo Piccardi, Andrew MacKinlay
Venue:
ALTA
SIG:
Publisher:
Australasian Language Technology Association
Note:
Pages:
77–86
Language:
URL:
https://aclanthology.org/U19-1011
DOI:
Bibkey:
Cite (ACL):
Gaurav Arora, Afshin Rahimi, and Timothy Baldwin. 2019. Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP. In Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association, pages 77–86, Sydney, Australia. Australasian Language Technology Association.
Cite (Informal):
Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP (Arora et al., ALTA 2019)
Copy Citation:
PDF:
https://aclanthology.org/U19-1011.pdf