Andrew Maurer
2022
Domain-specific knowledge distillation yields smaller and better models for conversational commerce
Kristen Howell
|
Jian Wang
|
Akshay Hazare
|
Joseph Bradley
|
Chris Brew
|
Xi Chen
|
Matthew Dunn
|
Beth Hockey
|
Andrew Maurer
|
Dominic Widdows
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)
We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.
Search
Fix data
Co-authors
- Joseph Bradley 1
- Chris Brew 1
- Xi Chen 1
- Matthew Dunn 1
- Akshay Hazare 1
- show all...