Emna Fsih

2022

A deep sentiment analysis of Tunisian dialect comments on multi-domain posts in different social media platforms
Emna Fsih | Rahma Boujelbane | Lamia Hadrich Belguith
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

pdf bib abs

Standardisation of Dialect Comments in Social Networks in View of Sentiment Analysis : Case of Tunisian Dialect
Saméh Kchaou | Rahma Boujelbane | Emna Fsih | Lamia Hadrich-Belguith
Proceedings of the Thirteenth Language Resources and Evaluation Conference

With the growing access to the internet, the spoken Arabic dialect language becomes informal languages written in social media. Most users post comments using their own dialect. This linguistic situation inhibits mutual understanding between internet users and makes difficult to use computational approaches since most Arabic resources are intended for the formal language: Modern Standard Arabic (MSA). In this paper, we present a pipeline to standardize the written texts in social networks by translating them to the standard language MSA. We fine-tun at first an identification bert-based model to select Tunisian Dialect (TD) from MSA and other dialects. Then, we learned transformer model to translate TD to MSA. The final system includes the translated TD text and the originally text written in MSA. Each of these steps was evaluated on the same test corpus. In order to test the effectiveness of the approach, we compared two opinion analysis models, the first intended for the Sentiment Analysis (SA) of dialect texts and the second for the MSA texts. We concluded that through standardization we obtain the best score.

pdf bib abs

Benchmarking transfer learning approaches for sentiment analysis of Arabic dialect
Emna Fsih | Sameh Kchaou | Rahma Boujelbane | Lamia Hadrich-Belguith
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

Arabic has a widely varying collection of dialects. With the explosion of the use of social networks, the volume of written texts has remarkably increased. Most users express themselves using their own dialect. Unfortunately, many of these dialects remain under-studied due to the scarcity of resources. Researchers and industry practitioners are increasingly interested in analyzing users’ sentiments. In this context, several approaches have been proposed, namely: traditional machine learning, deep learning transfer learning and more recently few-shot learning approaches. In this work, we compare their efficiency as part of the NADI competition to develop a country-level sentiment analysis model. Three models were beneficial for this sub-task: The first based on Sentence Transformer (ST) and achieve 43.23% on DEV set and 42.33% on TEST set, the second based on CAMeLBERT and achieve 47.85% on DEV set and 41.72% on TEST set and the third based on multi-dialect BERT model and achieve 66.72% on DEV set and 39.69% on TEST set.

Co-authors

Venues

Fix author