Vowel and Consonant Classification through Spectral Decomposition

Patricia Thaine, Gerald Penn


Abstract
We consider two related problems in this paper. Given an undeciphered alphabetic writing system or mono-alphabetic cipher, determine: (1) which of its letters are vowels and which are consonants; and (2) whether the writing system is a vocalic alphabet or an abjad. We are able to show that a very simple spectral decomposition based on character co-occurrences provides nearly perfect performance with respect to answering both question types.
Anthology ID:
W17-4112
Volume:
Proceedings of the First Workshop on Subword and Character Level Models in NLP
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venue:
SCLeM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
82–91
Language:
URL:
https://aclanthology.org/W17-4112
DOI:
10.18653/v1/W17-4112
Bibkey:
Cite (ACL):
Patricia Thaine and Gerald Penn. 2017. Vowel and Consonant Classification through Spectral Decomposition. In Proceedings of the First Workshop on Subword and Character Level Models in NLP, pages 82–91, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Vowel and Consonant Classification through Spectral Decomposition (Thaine & Penn, SCLeM 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-4112.pdf
Attachment:
 W17-4112.Attachment.rar