A corpus based morphological analyzer for unvocalized modern Hebrew

Alon Itai, Erel Segal


Abstract
Most words in Modern Hebrew texts are morphologically ambiguous. We describe a method for finding the correct morphological analysis of each word in a Modern Hebrew text. The program first uses a small tagged corpus to estimate the probability of each possible analysis of each word regardless of its context and chooses the most probable analysis. It then applies automatically learned rules to correct the analysis of each word according to its neighbors. Finally, it uses a simple syntactical analyzer to further correct the analysis, thus combining statistical methods with rule-based syntactic analysis. It is shown that this combination greatly improves the accuracy of the morphological analysis—achieving up to 96.2% accuracy.
Anthology ID:
2003.mtsummit-semit.9
Volume:
Workshop on Machine Translation for Semitic languages: issues and approaches
Month:
September 23-27
Year:
2003
Address:
New Orleans, USA
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2003.mtsummit-semit.9
DOI:
Bibkey:
Cite (ACL):
Alon Itai and Erel Segal. 2003. A corpus based morphological analyzer for unvocalized modern Hebrew. In Workshop on Machine Translation for Semitic languages: issues and approaches, New Orleans, USA.
Cite (Informal):
A corpus based morphological analyzer for unvocalized modern Hebrew (Itai & Segal, MTSummit 2003)
Copy Citation:
PDF:
https://aclanthology.org/2003.mtsummit-semit.9.pdf