OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network

Xavier Marjou


Abstract
To transcribe spoken language to written medium, most alphabets enable an unambiguous sound-to-letter rule. However, some writing systems have distanced themselves from this simple concept and little work exists in Natural Language Processing (NLP) on measuring such distance. In this study, we use an Artificial Neural Network (ANN) model to evaluate the transparency between written words and their pronunciation, hence its name Orthographic Transparency Estimation with an ANN (OTEANN). Based on datasets derived from Wikimedia dictionaries, we trained and tested this model to score the percentage of false predictions in phoneme-to-grapheme and grapheme-to-phoneme translation tasks. The scores obtained on 17 orthographies were in line with the estimations of other studies. Interestingly, the model also provided insight into typical mistakes made by learners who only consider the phonemic rule in reading and writing.
Anthology ID:
2021.sigtyp-1.1
Volume:
Proceedings of the Third Workshop on Computational Typology and Multilingual NLP
Month:
June
Year:
2021
Address:
Online
Editors:
Ekaterina Vylomova, Elizabeth Salesky, Sabrina Mielke, Gabriella Lapesa, Ritesh Kumar, Harald Hammarström, Ivan Vulić, Anna Korhonen, Roi Reichart, Edoardo Maria Ponti, Ryan Cotterell
Venue:
SIGTYP
SIG:
SIGTYP
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–9
Language:
URL:
https://aclanthology.org/2021.sigtyp-1.1
DOI:
10.18653/v1/2021.sigtyp-1.1
Bibkey:
Cite (ACL):
Xavier Marjou. 2021. OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, pages 1–9, Online. Association for Computational Linguistics.
Cite (Informal):
OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network (Marjou, SIGTYP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.sigtyp-1.1.pdf
Code
 marxav/oteann3 +  additional community code
Data
OTEANNv3