Embible: Reconstruction of Ancient Hebrew and Aramaic Texts Using Transformers

Niv Fono, Harel Moshayof, Eldar Karol, Itai Assraf, Mark Last


Abstract
Hebrew and Aramaic inscriptions serve as an essential source of information on the ancient history of the Near East. Unfortunately, some parts of the inscribed texts become illegible over time. Special experts, called epigraphists, use time-consuming manual procedures to estimate the missing content. This problem can be considered an extended masked language modeling task, where the damaged content can comprise single characters, character n-grams (partial words), single complete words, and multi-word n-grams.This study is the first attempt to apply the masked language modeling approach to corrupted inscriptions in Hebrew and Aramaic languages, both using the Hebrew alphabet consisting mostly of consonant symbols. In our experiments, we evaluate several transformer-based models, which are fine-tuned on the Biblical texts and tested on three different percentages of randomly masked parts in the testing corpus. For any masking percentage, the highest text completion accuracy is obtained with a novel ensemble of word and character prediction models.
Anthology ID:
2024.findings-eacl.56
Volume:
Findings of the Association for Computational Linguistics: EACL 2024
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
846–852
Language:
URL:
https://aclanthology.org/2024.findings-eacl.56
DOI:
Bibkey:
Cite (ACL):
Niv Fono, Harel Moshayof, Eldar Karol, Itai Assraf, and Mark Last. 2024. Embible: Reconstruction of Ancient Hebrew and Aramaic Texts Using Transformers. In Findings of the Association for Computational Linguistics: EACL 2024, pages 846–852, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Embible: Reconstruction of Ancient Hebrew and Aramaic Texts Using Transformers (Fono et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-eacl.56.pdf