James J. Clark
2024
Intermediate Layer Distillation with the Reused Teacher Classifier: A Study on the Importance of the Classifier of Attention-based Models
Hang Zhang
|
Seyyed Hasan Mozafari
|
James J. Clark
|
Brett H. Meyer
|
Warren J. Gross
Findings of the Association for Computational Linguistics: EMNLP 2024
Search