ViLegalLM: Language Models for Vietnamese Legal Text

Truong-Phuc Nguyen; Quy-Nhan Nguyen; Minh-Tien Nguyen

ViLegalLM: Language Models for Vietnamese Legal Text

Truong-Phuc Nguyen, Quy-Nhan Nguyen, Minh-Tien Nguyen

Abstract

We present **ViLegalLM**, comprising **ViLegalBERT** and **ViLegalQwen**, the first suite of Vietnamese pretrained language models for legal text understanding and generation. It includes one encoder-only model (ViLegalBERT, 135M parameters) and two decoder-only models (ViLegalQwen2.5-1.5B-Base and ViLegalQwen3-1.7B-Base), all continually pretrained on a newly curated 16GB Vietnamese legal corpus, significantly larger than previous work. To mitigate data scarcity, we construct three synthetic datasets using LLM-based generation and hard negative mining for True/False QA, Multiple Choice QA, and Natural Language Inference. We establish state-of-the-art results among open-source models on four main Vietnamese legal downstream tasks spanning ten benchmarks, demonstrating that continual pretraining from base models consistently outperforms instruction-tuned adaptation. Source codes, corpus, datasets, and model checkpoints are publicly available at https://github.com/ntphuc149/ViLegalLM.

Anthology ID:: 2026.findings-acl.1801
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 36136–36150
Language:
URL:: https://aclanthology.org/2026.findings-acl.1801/
DOI:
Bibkey:
Cite (ACL):: Truong-Phuc Nguyen, Quy-Nhan Nguyen, and Minh-Tien Nguyen. 2026. ViLegalLM: Language Models for Vietnamese Legal Text. In Findings of the Association for Computational Linguistics: ACL 2026, pages 36136–36150, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ViLegalLM: Language Models for Vietnamese Legal Text (Nguyen et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1801.pdf
Checklist:: 2026.findings-acl.1801.checklist.pdf

PDF Cite Search Checklist Fix data