Investigating Acceleration of LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with ‘LITE’ Neeraj Varshney author Agneet Chatterjee author Mihir Parmar author Chitta Baral author 2024-06 text Findings of the Association for Computational Linguistics: NAACL 2024 Kevin Duh editor Helena Gomez editor Steven Bethard editor Association for Computational Linguistics Mexico City, Mexico conference publication varshney-etal-2024-investigating 10.18653/v1/2024.findings-naacl.232 https://aclanthology.org/2024.findings-naacl.232/ 2024-06 3656 3677