AraFacts: The First Large Arabic Dataset of Naturally Occurring Claims

Zien Sheikh Ali, Watheq Mansour, Tamer Elsayed, Abdulaziz Al‐Ali


Abstract
We introduce AraFacts, the first large Arabic dataset of naturally occurring claims collected from 5 Arabic fact-checking websites, e.g., Fatabyyano and Misbar, and covering claims since 2016. Our dataset consists of 6,121 claims along with their factual labels and additional metadata, such as fact-checking article content, topical category, and links to posts or Web pages spreading the claim. Since the data is obtained from various fact-checking websites, we standardize the original claim labels to provide a unified label rating for all claims. Moreover, we provide revealing dataset statistics and motivate its use by suggesting possible research applications. The dataset is made publicly available for the research community.
Anthology ID:
2021.wanlp-1.26
Volume:
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Month:
April
Year:
2021
Address:
Kyiv, Ukraine (Virtual)
Venues:
EACL | WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
231–236
Language:
URL:
https://aclanthology.org/2021.wanlp-1.26
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.wanlp-1.26.pdf
Code
 bigirqu/arafacts