Hands-On NLP with Hugging Face: ALTA 2024 Tutorial on Efficient Fine-Tuning and Quantisation

Nicholas I-Hsien Kuo


Abstract
This tutorial, presented at ALTA 2024, focuses on efficient fine-tuning and quantisation techniques for large language models (LLMs), addressing challenges in deploying state-of-the-art models on resource-constrained hardware. It introduces parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), and model quantisation strategies, which enable training and inference of LLMs on GPUs with limited memory (e.g., 16 GB VRAM). Participants will work with TinyLlama (1.1B) and the public domain text War and Peace as an accessible dataset, ensuring there are no barriers like credentialled access to Hugging Face or PhysioNet datasets. The tutorial also demonstrates common training challenges, such as OutOfMemoryError, and shows how PEFT can mitigate these issues, enabling large-scale fine-tuning even in resource-limited environments.
Anthology ID:
2024.alta-1.20
Volume:
Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association
Month:
December
Year:
2024
Address:
Canberra, Australia
Editors:
Tim Baldwin, Sergio José Rodríguez Méndez, Nicholas Kuo
Venue:
ALTA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
213
Language:
URL:
https://aclanthology.org/2024.alta-1.20/
DOI:
Bibkey:
Cite (ACL):
Nicholas I-Hsien Kuo. 2024. Hands-On NLP with Hugging Face: ALTA 2024 Tutorial on Efficient Fine-Tuning and Quantisation. In Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association, pages 213–213, Canberra, Australia. Association for Computational Linguistics.
Cite (Informal):
Hands-On NLP with Hugging Face: ALTA 2024 Tutorial on Efficient Fine-Tuning and Quantisation (Kuo, ALTA 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.alta-1.20.pdf