A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models Hayeon Lee author Rui Hou author Jongpil Kim author Davis Liang author Sung Ju Hwang author Alexander Min author 2023-07 text Findings of the Association for Computational Linguistics: ACL 2023 Anna Rogers editor Jordan Boyd-Graber editor Naoaki Okazaki editor Association for Computational Linguistics Toronto, Canada conference publication lee-etal-2023-study 10.18653/v1/2023.findings-acl.714 https://aclanthology.org/2023.findings-acl.714/ 2023-07 11239 11246