Morphological Segmentation and Part of Speech Tagging for Religious Arabic

Emad Mohamed

Morphological Segmentation and Part of Speech Tagging for Religious Arabic

Abstract

We annotate a small corpus of religious Arabic with morphological segmentation boundaries and fine-grained segment-based part of speech tags. Experiments on both segmentation and POS tagging show that the religious corpus-trained segmenter and POS tagger outperform the Arabic Treebak-trained ones although the latter is 21 times as big, which shows the need for building religious Arabic linguistic resources. The small corpus we annotate improves segmentation accuracy by 5% absolute (from 90.84% to 95.70%), and POS tagging by 9% absolute (from 82.22% to 91.26) when using gold standard segmentation, and by 9.6% absolute (from 78.62% to 88.22) when using automatic segmentation.

Anthology ID:: 2012.amta-caas14.9
Volume:: Fourth Workshop on Computational Approaches to Arabic-Script-based Languages
Month:: November 1
Year:: 2012
Address:: San Diego, California, USA
Editors:: Ali Farghaly, Farhad Oroumchian
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:: 65–71
Language:
URL:: https://aclanthology.org/2012.amta-caas14.9/
DOI:
Bibkey:
Cite (ACL):: Emad Mohamed. 2012. Morphological Segmentation and Part of Speech Tagging for Religious Arabic. In Fourth Workshop on Computational Approaches to Arabic-Script-based Languages, pages 65–71, San Diego, California, USA. Association for Machine Translation in the Americas.
Cite (Informal):: Morphological Segmentation and Part of Speech Tagging for Religious Arabic (Mohamed, AMTA 2012)
Copy Citation:
PDF:: https://aclanthology.org/2012.amta-caas14.9.pdf

PDF Cite Search Fix data