AraFacts: The First Large Arabic Dataset of Naturally Occurring Claims

Zien Sheikh Ali, Watheq Mansour, Tamer Elsayed, Abdulaziz Al‐Ali


Abstract
We introduce AraFacts, the first large Arabic dataset of naturally occurring claims collected from 5 Arabic fact-checking websites, e.g., Fatabyyano and Misbar, and covering claims since 2016. Our dataset consists of 6,121 claims along with their factual labels and additional metadata, such as fact-checking article content, topical category, and links to posts or Web pages spreading the claim. Since the data is obtained from various fact-checking websites, we standardize the original claim labels to provide a unified label rating for all claims. Moreover, we provide revealing dataset statistics and motivate its use by suggesting possible research applications. The dataset is made publicly available for the research community.
Anthology ID:
2021.wanlp-1.26
Volume:
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Month:
April
Year:
2021
Address:
Kyiv, Ukraine (Virtual)
Editors:
Nizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, Samia Touileb
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
231–236
Language:
URL:
https://aclanthology.org/2021.wanlp-1.26
DOI:
Bibkey:
Cite (ACL):
Zien Sheikh Ali, Watheq Mansour, Tamer Elsayed, and Abdulaziz Al‐Ali. 2021. AraFacts: The First Large Arabic Dataset of Naturally Occurring Claims. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 231–236, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
Cite (Informal):
AraFacts: The First Large Arabic Dataset of Naturally Occurring Claims (Sheikh Ali et al., WANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wanlp-1.26.pdf
Code
 bigirqu/arafacts