Erel Segal


pdf bib
A corpus based morphological analyzer for unvocalized modern Hebrew
Alon Itai | Erel Segal
Workshop on Machine Translation for Semitic languages: issues and approaches

Most words in Modern Hebrew texts are morphologically ambiguous. We describe a method for finding the correct morphological analysis of each word in a Modern Hebrew text. The program first uses a small tagged corpus to estimate the probability of each possible analysis of each word regardless of its context and chooses the most probable analysis. It then applies automatically learned rules to correct the analysis of each word according to its neighbors. Finally, it uses a simple syntactical analyzer to further correct the analysis, thus combining statistical methods with rule-based syntactic analysis. It is shown that this combination greatly improves the accuracy of the morphological analysis—achieving up to 96.2% accuracy.