Semantic Language Model for Tunisian Dialect

Abir Masmoudi, Rim Laatar, Mariem Ellouze, Lamia Hadrich Belguith


Abstract
In this paper, we describe the process of creating a statistical Language Model (LM) for the Tunisian Dialect. Indeed, this work is part of the realization of Automatic Speech Recognition (ASR) system for the Tunisian Railway Transport Network. Since our eld of work has been limited, there are several words with similar behaviors (semantic for example) but they do not have the same appearance probability; their class groupings will therefore be possible. For these reasons, we propose to build an n-class LM that is based mainly on the integration of purely semantic data. Indeed, each class represents an abstraction of similar labels. In order to improve the sequence labeling task, we proposed to use a discriminative algorithm based on the Conditional Random Field (CRF) model. To better judge our choice of creating an n-class word model, we compared the created model with the 3-gram type model on the same test corpus of evaluation. Additionally, to assess the impact of using the CRF model to perform the semantic labelling task in order to construct semantic classes, we compared the n-class created model with using the CRF in the semantic labelling task and the n- class model without using the CRF in the semantic labelling task. The drawn comparison of the predictive power of the n-class model obtained by applying the CRF model in the semantic labelling is that it is better than the other two models presenting the highest value of its perplexity.
Anthology ID:
R19-1084
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
720–729
Language:
URL:
https://aclanthology.org/R19-1084
DOI:
10.26615/978-954-452-056-4_084
Bibkey:
Cite (ACL):
Abir Masmoudi, Rim Laatar, Mariem Ellouze, and Lamia Hadrich Belguith. 2019. Semantic Language Model for Tunisian Dialect. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 720–729, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Semantic Language Model for Tunisian Dialect (Masmoudi et al., RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1084.pdf