Investigating Acceleration of LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with ‘LITE’

Investigating Acceleration of LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with ‘LITE’ Neeraj Varshney author Agneet Chatterjee author Mihir Parmar author Chitta Baral author 2024-06 text Findings of the Association for Computational Linguistics: NAACL 2024 Kevin Duh editor Helena Gomez editor Steven Bethard editor Association for Computational Linguistics Mexico City, Mexico conference publication varshney-etal-2024-investigating 10.18653/v1/2024.findings-naacl.232 https://aclanthology.org/2024.findings-naacl.232/ 2024-06 3656 3677