Morphological Segmentation and Part of Speech Tagging for Religious Arabic

Emad Mohamed


Abstract
We annotate a small corpus of religious Arabic with morphological segmentation boundaries and fine-grained segment-based part of speech tags. Experiments on both segmentation and POS tagging show that the religious corpus-trained segmenter and POS tagger outperform the Arabic Treebak-trained ones although the latter is 21 times as big, which shows the need for building religious Arabic linguistic resources. The small corpus we annotate improves segmentation accuracy by 5% absolute (from 90.84% to 95.70%), and POS tagging by 9% absolute (from 82.22% to 91.26) when using gold standard segmentation, and by 9.6% absolute (from 78.62% to 88.22) when using automatic segmentation.
Anthology ID:
2012.amta-caas14.9
Volume:
Fourth Workshop on Computational Approaches to Arabic-Script-based Languages
Month:
November 1
Year:
2012
Address:
San Diego, California, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
65–71
Language:
URL:
https://aclanthology.org/2012.amta-caas14.9
DOI:
Bibkey:
Cite (ACL):
Emad Mohamed. 2012. Morphological Segmentation and Part of Speech Tagging for Religious Arabic. In Fourth Workshop on Computational Approaches to Arabic-Script-based Languages, pages 65–71, San Diego, California, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Morphological Segmentation and Part of Speech Tagging for Religious Arabic (Mohamed, AMTA 2012)
Copy Citation:
PDF:
https://aclanthology.org/2012.amta-caas14.9.pdf