Knowledge Distillation for Swedish NER models: A Search for Performance and Efficiency

Lovisa Hagström, Richard Johansson


Abstract
The current recipe for better model performance within NLP is to increase model size and training data. While it gives us models with increasingly impressive results, it also makes it more difficult to train and deploy state-of-the-art models for NLP due to increasing computational costs. Model compression is a field of research that aims to alleviate this problem. The field encompasses different methods that aim to preserve the performance of a model while decreasing the size of it. One such method is knowledge distillation. In this article, we investigate the effect of knowledge distillation for named entity recognition models in Swedish. We show that while some sequence tagging models benefit from knowledge distillation, not all models do. This prompts us to ask questions about in which situations and for which models knowledge distillation is beneficial. We also reason about the effect of knowledge distillation on computational costs.
Anthology ID:
2021.nodalida-main.13
Volume:
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May 31--2 June
Year:
2021
Address:
Reykjavik, Iceland (Online)
Editors:
Simon Dobnik, Lilja Øvrelid
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press, Sweden
Note:
Pages:
124–134
Language:
URL:
https://aclanthology.org/2021.nodalida-main.13
DOI:
Bibkey:
Cite (ACL):
Lovisa Hagström and Richard Johansson. 2021. Knowledge Distillation for Swedish NER models: A Search for Performance and Efficiency. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 124–134, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Cite (Informal):
Knowledge Distillation for Swedish NER models: A Search for Performance and Efficiency (Hagström & Johansson, NoDaLiDa 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nodalida-main.13.pdf
Data
CoNLL 2003