Automatic diacritization of Tunisian dialect text using Recurrent Neural Network

Abir Masmoudi; Mariem Ellouze Khemekhem; Lamia Hadrich Belguith

doi:10.26615/978-954-452-056-4_085

Automatic diacritization of Tunisian dialect text using Recurrent Neural Network

Abir Masmoudi, Mariem Ellouze, Lamia Hadrich belguith

Abstract

The absence of diacritical marks in the Arabic texts generally leads to morphological, syntactic and semantic ambiguities. This can be more blatant when one deals with under-resourced languages, such as the Tunisian dialect, which suffers from unavailability of basic tools and linguistic resources, like sufficient amount of corpora, multilingual dictionaries, morphological and syntactic analyzers. Thus, this language processing faces greater challenges due to the lack of these resources. The automatic diacritization of MSA text is one of the various complex problems that can be solved by deep neural networks today. Since the Tunisian dialect is an under-resourced language of MSA and as there are a lot of resemblance between both languages, we suggest to investigate a recurrent neural network (RNN) for this dialect diacritization problem. This model will be compared to our previous models models CRF and SMT (CITATION) based on the same dialect corpus. We can experimentally show that our model can achieve better outcomes (DER of 10.72%), as compared to the two models CRF (DER of 20.25%) and SMT (DER of 33.15%).

Anthology ID:: R19-1085
Volume:: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:: September
Year:: 2019
Address:: Varna, Bulgaria
Editors:: Ruslan Mitkov, Galia Angelova
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd.
Note:
Pages:: 730–739
Language:
URL:: https://aclanthology.org/R19-1085/
DOI:: 10.26615/978-954-452-056-4_085
Bibkey:
Cite (ACL):: Abir Masmoudi, Mariem Ellouze, and Lamia Hadrich belguith. 2019. Automatic diacritization of Tunisian dialect text using Recurrent Neural Network. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 730–739, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):: Automatic diacritization of Tunisian dialect text using Recurrent Neural Network (Masmoudi et al., RANLP 2019)
Copy Citation:
PDF:: https://aclanthology.org/R19-1085.pdf

PDF Cite Search Fix data