Sharad Duwal
2025
Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation on Nepali
Sharad Duwal
|
Suraj Prasai
|
Suresh Manandhar
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)
Continual learning has emerged as an important research direction due to the infeasibility of retraining large language models (LLMs) from scratch in the event of new data availability. Of great interest is the domain-adaptive pre-training (DAPT) paradigm, which focuses on continually training a pre-trained language model to adapt it to a domain it wasn’t originally trained on. In this work, we evaluate the feasibility of DAPT in a low-resource setting, namely the Nepali language. We use synthetic data to continue training Llama 3 8B to adapt it to the Nepali language in a 4-bit QLoRA setting. We evaluate the adapted model on its performance, catastrophic forgetting, and knowledge acquisition. We compare the base model and the final model on their Nepali generation abilities, their performance on popular benchmarks, and run case-studies to probe their linguistic knowledge in Nepali. We use GPT-4o as an evaluator to establish that the final model has learned to generate Nepali. We see some unsurprising forgetting in the final model, but also surprisingly find that increasing the number of shots while evaluation yields better percent increases in the final model (as high as 19.29% increase) compared to the base model (4.98%), suggesting latent retention. We also explore layer–head self-attention heatmaps to establish the dependency resolution abilities of the final model in Nepali. We open-source the model and the code.