Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP

Gaurav Arora; Afshin Rahimi; Timothy Baldwin

Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP

Gaurav Arora, Afshin Rahimi, Timothy Baldwin

Abstract

Catastrophic forgetting — whereby a model trained on one task is fine-tuned on a second, and in doing so, suffers a “catastrophic” drop in performance over the first task — is a hurdle in the development of better transfer learning techniques. Despite impressive progress in reducing catastrophic forgetting, we have limited understanding of how different architectures and hyper-parameters affect forgetting in a network. With this study, we aim to understand factors which cause forgetting during sequential training. Our primary finding is that CNNs forget less than LSTMs. We show that max-pooling is the underlying operation which helps CNNs alleviate forgetting compared to LSTMs. We also found that curriculum learning, placing a hard task towards the end of task sequence, reduces forgetting. We analysed the effect of fine-tuning contextual embeddings on catastrophic forgetting and found that using embeddings as feature extractor is preferable to fine-tuning in continual learning setup.

Anthology ID:: U19-1011
Volume:: Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association
Month:: 4--6 December
Year:: 2019
Address:: Sydney, Australia
Editors:: Meladel Mistica, Massimo Piccardi, Andrew MacKinlay
Venue:: ALTA
SIG:
Publisher:: Australasian Language Technology Association
Note:
Pages:: 77–86
Language:
URL:: https://aclanthology.org/U19-1011/
DOI:
Bibkey:
Cite (ACL):: Gaurav Arora, Afshin Rahimi, and Timothy Baldwin. 2019. Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP. In Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association, pages 77–86, Sydney, Australia. Australasian Language Technology Association.
Cite (Informal):: Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP (Arora et al., ALTA 2019)
Copy Citation:
PDF:: https://aclanthology.org/U19-1011.pdf

PDF Cite Search Fix data