LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Minghao Wu; Abdul Waheed; Chiyu Zhang; Muhammad Abdul-Mageed; Alham Fikri Aji

doi:10.18653/v1/2024.eacl-long.57

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham Fikri Aji

Abstract

Large language models (LLMs) with instruction fine-tuning demonstrate superior generative capabilities. However, these models are resource-intensive. To alleviate this issue, we explore distilling knowledge from instruction-tuned LLMs into much smaller ones. While other similar works have been done, they are often conducted on a limited set of (usually still large) models and are not accompanied by proper evaluations. To this end, we carefully develop a large set of 2.58M instructions based on both existing and newly-generated instructions. In addition to being sizable, we design our instructions to cover a broad set of topics to ensure diversity. Extensive analysis of our instruction dataset confirms its diversity, and we generate responses for these instructions using gpt-3.5-turbo. Leveraging these instructions, we fine-tune a diverse herd of models, collectively referred to as LaMini-LM, which includes models from both the encoder-decoder and decoder-only families, with varying sizes. We evaluate the performance of our models using automatic metrics on 15 different natural language processing (NLP) benchmarks, as well as through human assessment. We also assess the model for hallucination and toxicity, and for the former, we introduce a new benchmark dataset for hallucination-inducing QA. The results demonstrate that our proposed LaMini-LM models are comparable to strong baselines while being much smaller in size.

Anthology ID:: 2024.eacl-long.57
Volume:: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2024
Address:: St. Julian’s, Malta
Editors:: Yvette Graham, Matthew Purver
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 944–964
Language:
URL:: https://aclanthology.org/2024.eacl-long.57/
DOI:: 10.18653/v1/2024.eacl-long.57
Bibkey:
Cite (ACL):: Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, and Alham Fikri Aji. 2024. LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 944–964, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):: LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions (Wu et al., EACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.eacl-long.57.pdf
Video:: https://aclanthology.org/2024.eacl-long.57.mp4

PDF Cite Search Video Fix data