Universal-KD: Attention-based Output-Grounded Intermediate Layer Knowledge Distillation Yimeng Wu author Mehdi Rezagholizadeh author Abbas Ghaddar author Md Akmal Haidar author Ali Ghodsi author 2021-11 text Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Marie-Francine Moens editor Xuanjing Huang editor Lucia Specia editor Scott Wen-tau Yih editor Association for Computational Linguistics Online and Punta Cana, Dominican Republic conference publication wu-etal-2021-universal 10.18653/v1/2021.emnlp-main.603 https://aclanthology.org/2021.emnlp-main.603/ 2021-11 7649 7661