QurAna: Corpus of the Quran annotated with Pronominal Anaphora

Abdul-Baquee Sharaf, Eric Atwell


Abstract
This paper presents QurAna: a large corpus created from the original Quranic text, where personal pronouns are tagged with their antecedence. These antecedents are maintained as an ontological list of concepts, which have proved helpful for information retrieval tasks. QurAna is characterized by: (a) comparatively large number of pronouns tagged with antecedent information (over 24,500 pronouns), and (b) maintenance of an ontological concept list out of these antecedents. We have shown useful applications of this corpus. This corpus is first of its kind considering classical Arabic text, which could be used for interesting applications for Modern Standard Arabic as well. This corpus would benefit researchers in obtaining empirical and rules in building new anaphora resolution approaches. Also, such corpus would be used to train, optimize and evaluate existing approaches.
Anthology ID:
L12-1011
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
130–137
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/123_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Abdul-Baquee Sharaf and Eric Atwell. 2012. QurAna: Corpus of the Quran annotated with Pronominal Anaphora. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 130–137, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
QurAna: Corpus of the Quran annotated with Pronominal Anaphora (Sharaf & Atwell, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/123_Paper.pdf