ARNE - A tool for Namend Entity Recognition from Arabic Text

Carolin Shihadeh, Günter Neumann


Abstract
In this paper, we study the problem of finding named entities in the Arabic text. For this task we present the development of our pipeline software for Arabic named entity recognition (ARNE), which includes tokenization, morphological analysis, Buckwalter transliteration, part of speech tagging and named entity recognition of person, location and organisation named entities. In our first attempt to recognize named entites, we have used a simple, fast and language independent gazetteer lookup approach. In our second attempt, we have used the morphological analysis provided by our pipeline to remove affixes and observed hence an improvement in our performance. The pipeline presented in this paper, can be used in future as a basis for a named entity recognition system that recognized named entites not only using gazetteers, but also making use of morphological information and part of speech tagging.
Anthology ID:
2012.amta-caas14.4
Volume:
Fourth Workshop on Computational Approaches to Arabic-Script-based Languages
Month:
November 1
Year:
2012
Address:
San Diego, California, USA
Editors:
Ali Farghaly, Farhad Oroumchian
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
24–31
Language:
URL:
https://aclanthology.org/2012.amta-caas14.4
DOI:
Bibkey:
Cite (ACL):
Carolin Shihadeh and Günter Neumann. 2012. ARNE - A tool for Namend Entity Recognition from Arabic Text. In Fourth Workshop on Computational Approaches to Arabic-Script-based Languages, pages 24–31, San Diego, California, USA. Association for Machine Translation in the Americas.
Cite (Informal):
ARNE - A tool for Namend Entity Recognition from Arabic Text (Shihadeh & Neumann, AMTA 2012)
Copy Citation:
PDF:
https://aclanthology.org/2012.amta-caas14.4.pdf