ViSoLex: An Open-Source Repository for Vietnamese Social Media Lexical Normalization

Anh Thi-Hoang Nguyen, Dung Ha Nguyen, Kiet Van Nguyen


Abstract
ViSoLex is an open-source system designed to address the unique challenges of lexical normalization for Vietnamese social media text. The platform provides two core services: Non-Standard Word (NSW) Lookup and Lexical Normalization, enabling users to retrieve standard forms of informal language and standardize text containing NSWs. ViSoLex’s architecture integrates pre-trained language models and weakly supervised learning techniques to ensure accurate and efficient normalization, overcoming the scarcity of labeled data in Vietnamese. This paper details the system’s design, functionality, and its applications for researchers and non-technical users. Additionally, ViSoLex offers a flexible, customizable framework that can be adapted to various datasets and research requirements. By publishing the source code, ViSoLex aims to contribute to the development of more robust Vietnamese natural language processing tools and encourage further research in lexical normalization. Future directions include expanding the system’s capabilities for additional languages and improving the handling of more complex non-standard linguistic patterns.
Anthology ID:
2025.coling-demos.18
Volume:
Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Brodie Mather, Mark Dras
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
183–188
Language:
URL:
https://aclanthology.org/2025.coling-demos.18/
DOI:
Bibkey:
Cite (ACL):
Anh Thi-Hoang Nguyen, Dung Ha Nguyen, and Kiet Van Nguyen. 2025. ViSoLex: An Open-Source Repository for Vietnamese Social Media Lexical Normalization. In Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations, pages 183–188, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
ViSoLex: An Open-Source Repository for Vietnamese Social Media Lexical Normalization (Nguyen et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-demos.18.pdf