Hebrew Diacritics Restoration using Visual Representation

Yair Elboher, Yuval Pinter


Abstract
Diacritics restoration in Hebrew is a fundamental task for ensuring accurate word pronunciation and disambiguating textual meaning. Despite the language’s high degree of ambiguity when unvocalized, recent machine learning approaches have significantly advanced performance on this task. In this work, we present DiVRit, a novel system for Hebrew diacritization that frames the task as a zero-shot classification problem. Our approach operates at the word level, selecting the most appropriate diacritization pattern for each undiacritized word from a dynamically generated candidate set, conditioned on the surrounding textual context. A key innovation of DiVRit is its use of a Hebrew Visual Language Model to process diacritized candidates as images, allowing diacritic information to be embedded directly within their vector representations while the surrounding context remains tokenization-based. Through a comprehensive evaluation across various configurations, we demonstrate that the system effectively performs diacritization without relying on complex, explicit linguistic analysis. Notably, in an “oracle” setting where the correct diacritized form is guaranteed to be among the provided candidates, DiVRit achieves a high level of accuracy. Furthermore, strategic architectural enhancements and optimized training methodologies yield significant improvements in the system’s overall generalization capabilities. These findings highlight the promising potential of visual representations for accurate and automated Hebrew diacritization.
Anthology ID:
2026.loreslm-1.44
Volume:
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:
LoResLM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
504–514
Language:
URL:
https://aclanthology.org/2026.loreslm-1.44/
DOI:
Bibkey:
Cite (ACL):
Yair Elboher and Yuval Pinter. 2026. Hebrew Diacritics Restoration using Visual Representation. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 504–514, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Hebrew Diacritics Restoration using Visual Representation (Elboher & Pinter, LoResLM 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.loreslm-1.44.pdf