Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step

Liunian Harold Li; Jack Hessel; Youngjae Yu; Xiang Ren; Kai-Wei Chang; Yejin Choi

doi:10.18653/v1/2023.acl-long.150

Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step

Liunian Harold Li, Jack Hessel, Youngjae Yu, Xiang Ren, Kai-Wei Chang, Yejin Choi

Abstract

Chain-of-thought prompting (e.g., “Let’s think step-by-ste”) primes large language models to verbalize rationalization for their predictions. While chain-of-thought can lead to dramatic performance gains, benefits appear to emerge only for sufficiently large models (beyond 50B parameters). We show that orders-of-magnitude smaller models (125M—1.3B parameters) can still benefit from chain-of-thought prompting. To achieve this, we introduce Symbolic Chain-of-Thought Distillation (SCoTD), a method to train a smaller student model on rationalizations sampled from a significantly larger teacher model. Experiments across several commonsense benchmarks show that: 1) SCoTD enhances the performance of the student model in both supervised and few-shot settings, and especially for challenge sets; 2) sampling many reasoning chains per instance from the teacher is paramount; and 3) after distillation, student chain-of-thoughts are judged by humans as comparable to the teacher, despite orders of magnitude fewer parameters. We test several hypotheses regarding what properties of chain-of-thought samples are important, e.g., diversity vs. teacher likelihood vs. open-endedness. We release our corpus of chain-of-thought samples and code.

Anthology ID:: 2023.acl-long.150
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2665–2679
Language:
URL:: https://aclanthology.org/2023.acl-long.150/
DOI:: 10.18653/v1/2023.acl-long.150
Bibkey:
Cite (ACL):: Liunian Harold Li, Jack Hessel, Youngjae Yu, Xiang Ren, Kai-Wei Chang, and Yejin Choi. 2023. Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2665–2679, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step (Li et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-long.150.pdf
Video:: https://aclanthology.org/2023.acl-long.150.mp4

PDF Cite Search Video Fix data