Rim El Ballouli
Also published as: Rim El Ballouli
2017
CAT: Credibility Analysis of Arabic Content on Twitter
Rim El Ballouli
|
Wassim El-Hajj
|
Ahmad Ghandour
|
Shady Elbassuoni
|
Hazem Hajj
|
Khaled Shaban
Proceedings of the Third Arabic Natural Language Processing Workshop
Data generated on Twitter has become a rich source for various data mining tasks. Those data analysis tasks that are dependent on the tweet semantics, such as sentiment analysis, emotion mining, and rumor detection among others, suffer considerably if the tweet is not credible, not real, or spam. In this paper, we perform an extensive analysis on credibility of Arabic content on Twitter. We also build a classification model (CAT) to automatically predict the credibility of a given Arabic tweet. Of particular originality is the inclusion of features extracted directly or indirectly from the author’s profile and timeline. To train and test CAT, we annotated for credibility a data set of 9,000 Arabic tweets that are topic independent. CAT achieved consistent improvements in predicting the credibility of the tweets when compared to several baselines and when compared to the state-of-the-art approach with an improvement of 21% in weighted average F-measure. We also conducted experiments to highlight the importance of the user-based features as opposed to the content-based features. We conclude our work with a feature reduction experiment that highlights the best indicative features of credibility.
2016
Arabic Corpora for Credibility Analysis
Ayman Al Zaatari
|
Rim El Ballouli
|
Shady ELbassouni
|
Wassim El-Hajj
|
Hazem Hajj
|
Khaled Shaban
|
Nizar Habash
|
Emad Yahya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
A significant portion of data generated on blogging and microblogging websites is non-credible as shown in many recent studies. To filter out such non-credible information, machine learning can be deployed to build automatic credibility classifiers. However, as in the case with most supervised machine learning approaches, a sufficiently large and accurate training data must be available. In this paper, we focus on building a public Arabic corpus of blogs and microblogs that can be used for credibility classification. We focus on Arabic due to the recent popularity of blogs and microblogs in the Arab World and due to the lack of any such public corpora in Arabic. We discuss our data acquisition approach and annotation process, provide rigid analysis on the annotated data and finally report some results on the effectiveness of our data for credibility classification.
Search
Co-authors
- Wassim El-Hajj 2
- Hazem Hajj 2
- Khaled Shaban 2
- Ayman Al Zaatari 1
- Shady ELbassouni 1
- show all...