Morphological Segmentation and Part of Speech Tagging for Religious Arabic

Emad Mohamed


Abstract
We annotate a small corpus of religious Arabic with morphological segmentation boundaries and fine-grained segment-based part of speech tags. Experiments on both segmentation and POS tagging show that the religious corpus-trained segmenter and POS tagger outperform the Arabic Treebak-trained ones although the latter is 21 times as big, which shows the need for building religious Arabic linguistic resources. The small corpus we annotate improves segmentation accuracy by 5% absolute (from 90.84% to 95.70%), and POS tagging by 9% absolute (from 82.22% to 91.26) when using gold standard segmentation, and by 9.6% absolute (from 78.62% to 88.22) when using automatic segmentation.
Anthology ID:
2012.amta-caas14.9
Volume:
Fourth Workshop on Computational Approaches to Arabic-Script-based Languages
Month:
November 1
Year:
2012
Address:
San Diego, California, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
65–71
Language:
URL:
https://aclanthology.org/2012.amta-caas14.9
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2012.amta-caas14.9.pdf