Compilation of an Arabic Children’s Corpus

Latifa Al-Sulaiti, Noorhan Abbas, Claire Brierley, Eric Atwell, Ayman Alghamdi


Abstract
Inspired by the Oxford Children’s Corpus, we have developed a prototype corpus of Arabic texts written and/or selected for children. Our Arabic Children’s Corpus of 2950 documents and nearly 2 million words has been collected manually from the web during a 3-month project. It is of high quality, and contains a range of different children’s genres based on sources located, including classic tales from The Arabian Nights, and popular fictional characters such as Goha. We anticipate that the current and subsequent versions of our corpus will lead to interesting studies in text classification, language use, and ideology in children’s texts.
Anthology ID:
L16-1285
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1808–1812
Language:
URL:
https://aclanthology.org/L16-1285
DOI:
Bibkey:
Cite (ACL):
Latifa Al-Sulaiti, Noorhan Abbas, Claire Brierley, Eric Atwell, and Ayman Alghamdi. 2016. Compilation of an Arabic Children’s Corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1808–1812, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Compilation of an Arabic Children’s Corpus (Al-Sulaiti et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1285.pdf