Elazar Gershuni


2022

pdf bib
Restoring Hebrew Diacritics Without a Dictionary
Elazar Gershuni | Yuval Pinter
Findings of the Association for Computational Linguistics: NAACL 2022

We demonstrate that it is feasible to accurately diacritize Hebrew script without any human-curated resources other than plain diacritized text. We present Nakdimon, a two-layer character-level LSTM, that performs on par with much more complicated curation-dependent systems, across a diverse array of modern Hebrew sources. The model is accompanied by a training set and a test set, collected from diverse sources.
Search
Co-authors
Venues