Towards Realistic Single-Task Continuous Learning Research for NER

Justin Payan, Yuval Merhav, He Xie, Satyapriya Krishna, Anil Ramakrishna, Mukund Sridhar, Rahul Gupta


Abstract
There is an increasing interest in continuous learning (CL), as data privacy is becoming a priority for real-world machine learning applications. Meanwhile, there is still a lack of academic NLP benchmarks that are applicable for realistic CL settings, which is a major challenge for the advancement of the field. In this paper we discuss some of the unrealistic data characteristics of public datasets, study the challenges of realistic single-task continuous learning as well as the effectiveness of data rehearsal as a way to mitigate accuracy loss. We construct a CL NER dataset from an existing publicly available dataset and release it along with the code to the research community.
Anthology ID:
2021.findings-emnlp.319
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3773–3783
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.319
DOI:
10.18653/v1/2021.findings-emnlp.319
Bibkey:
Cite (ACL):
Justin Payan, Yuval Merhav, He Xie, Satyapriya Krishna, Anil Ramakrishna, Mukund Sridhar, and Rahul Gupta. 2021. Towards Realistic Single-Task Continuous Learning Research for NER. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3773–3783, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Towards Realistic Single-Task Continuous Learning Research for NER (Payan et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.319.pdf
Software:
 2021.findings-emnlp.319.Software.zip
Video:
 https://aclanthology.org/2021.findings-emnlp.319.mp4
Code
 justinpayan/stackoverflowner-ns