Analysis of Glyph and Writing System Similarities Using Siamese Neural Networks

Claire Roman, Philippe Meyer


Abstract
In this paper we use siamese neural networks to compare glyphs and writing systems. These deep learning models define distance-like functions and are used to explore and visualize the space of scripts by performing multidimensional scaling and clustering analyses. From 51 historical European, Mediterranean and Middle Eastern alphabets, we use a Ward-linkage hierarchical clustering and obtain 10 clusters of scripts including three isolated writing systems. To collect the glyph database we use the Noto family fonts that encode in a standard form the Unicode character repertoire. This approach has the potential to reveal connections among scripts and civilizations and to help the deciphering of ancient scripts.
Anthology ID:
2024.lt4hala-1.12
Volume:
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Rachele Sprugnoli, Marco Passarotti
Venues:
LT4HALA | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
98–104
Language:
URL:
https://aclanthology.org/2024.lt4hala-1.12
DOI:
Bibkey:
Cite (ACL):
Claire Roman and Philippe Meyer. 2024. Analysis of Glyph and Writing System Similarities Using Siamese Neural Networks. In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, pages 98–104, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Analysis of Glyph and Writing System Similarities Using Siamese Neural Networks (Roman & Meyer, LT4HALA-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lt4hala-1.12.pdf