Nicholas I-Hsien Kuo
2024
Hands-On NLP with Hugging Face: ALTA 2024 Tutorial on Efficient Fine-Tuning and Quantisation
Nicholas I-Hsien Kuo
Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association
This tutorial, presented at ALTA 2024, focuses on efficient fine-tuning and quantisation techniques for large language models (LLMs), addressing challenges in deploying state-of-the-art models on resource-constrained hardware. It introduces parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), and model quantisation strategies, which enable training and inference of LLMs on GPUs with limited memory (e.g., 16 GB VRAM). Participants will work with TinyLlama (1.1B) and the public domain text War and Peace as an accessible dataset, ensuring there are no barriers like credentialled access to Hugging Face or PhysioNet datasets. The tutorial also demonstrates common training challenges, such as OutOfMemoryError, and shows how PEFT can mitigate these issues, enabling large-scale fine-tuning even in resource-limited environments.