Mohammed A. Attia
2006
An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic Modeling Finite State Networks
Mohammed A. Attia
Proceedings of the International Conference on the Challenge of Arabic for NLP/MT
Morphological ambiguity is a major concern for syntactic parsers, POS taggers and other NLP tools. For example, the greater the number of morphological analyses given for a lexical entry, the longer a parser takes in analyzing a sentence, and the greater the number of parses it produces. Xerox Arabic Finite State Morphology and Buckwalter Arabic Morphological Analyzer are two of the best known, well documented, morphological analyzers for Modern Standard Arabic (MSA). Yet there are significant problems with both systems in design as well as coverage that increase the ambiguity rate. This paper shows how an ambiguity-controlled morphological analyzer for Arabic is built in a rule-based system that takes the stem as the base form using finite state technology. The paper also points out sources of legal and illegal ambiguities in MSA, and how ambiguity in the new system is reduced without compromising precision. At the end, an evaluation of Xerox, Buckwalter, and our system is conducted, and the performance is compared and analyzed.