Towards Realistic Single-Task Continuous Learning Research for NER
Justin Payan | Yuval Merhav | He Xie | Satyapriya Krishna | Anil Ramakrishna | Mukund Sridhar | Rahul Gupta
Findings of the Association for Computational Linguistics: EMNLP 2021
There is an increasing interest in continuous learning (CL), as data privacy is becoming a priority for real-world machine learning applications. Meanwhile, there is still a lack of academic NLP benchmarks that are applicable for realistic CL settings, which is a major challenge for the advancement of the field. In this paper we discuss some of the unrealistic data characteristics of public datasets, study the challenges of realistic single-task continuous learning as well as the effectiveness of data rehearsal as a way to mitigate accuracy loss. We construct a CL NER dataset from an existing publicly available dataset and release it along with the code to the research community.