Using Stemming in Morphological Analysis to Improve Arabic Information Retrieval

Nasredine Semmar, Meriama Laib, Christian Fluhr


Abstract
Information retrieval (IR) consists in finding all relevant documents for a user query in a collection of documents. These documents are ordered by the probability of being relevant to the user’s query. The highest ranked document is considered to be the most likely relevant document. Natural Language Processing (NLP) for IR aims to transform the potentially ambiguous words of queries and documents into unambiguous internal representations on which matching and retrieval can take place. This transformation is generally achieved by several levels of linguistic analysis, morphological, syntactic and so forth. In this paper, we present the Arabic linguistic analyzer used in the LIC2M cross-lingual search engine. We focus on the morphological analyzer and particularly the clitic stemmer which segments the input words into proclitics, simple forms and enclitics. We demonstrate that stemming improves search engine recall and precision.
Anthology ID:
2006.jeptalnrecital-long.29
Volume:
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Month:
April
Year:
2006
Address:
Leuven, Belgique
Editors:
Piet Mertens, Cédrick Fairon, Anne Dister, Patrick Watrin
Venue:
JEP/TALN/RECITAL
SIG:
Publisher:
ATALA
Note:
Pages:
318–327
Language:
URL:
https://aclanthology.org/2006.jeptalnrecital-long.29
DOI:
Bibkey:
Cite (ACL):
Nasredine Semmar, Meriama Laib, and Christian Fluhr. 2006. Using Stemming in Morphological Analysis to Improve Arabic Information Retrieval. In Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, pages 318–327, Leuven, Belgique. ATALA.
Cite (Informal):
Using Stemming in Morphological Analysis to Improve Arabic Information Retrieval (Semmar et al., JEP/TALN/RECITAL 2006)
Copy Citation:
PDF:
https://aclanthology.org/2006.jeptalnrecital-long.29.pdf