Semantic Similarity of Arabic Sentences with Word Embeddings

El Moatez Billah Nagoudi, Didier Schwab


Abstract
Semantic textual similarity is the basis of countless applications and plays an important role in diverse areas, such as information retrieval, plagiarism detection, information extraction and machine translation. This article proposes an innovative word embedding-based system devoted to calculate the semantic similarity in Arabic sentences. The main idea is to exploit vectors as word representations in a multidimensional space in order to capture the semantic and syntactic properties of words. IDF weighting and Part-of-Speech tagging are applied on the examined sentences to support the identification of words that are highly descriptive in each sentence. The performance of our proposed system is confirmed through the Pearson correlation between our assigned semantic similarity scores and human judgments.
Anthology ID:
W17-1303
Volume:
Proceedings of the Third Arabic Natural Language Processing Workshop
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Nizar Habash, Mona Diab, Kareem Darwish, Wassim El-Hajj, Hend Al-Khalifa, Houda Bouamor, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
SEMITIC
Publisher:
Association for Computational Linguistics
Note:
Pages:
18–24
Language:
URL:
https://aclanthology.org/W17-1303
DOI:
10.18653/v1/W17-1303
Bibkey:
Cite (ACL):
El Moatez Billah Nagoudi and Didier Schwab. 2017. Semantic Similarity of Arabic Sentences with Word Embeddings. In Proceedings of the Third Arabic Natural Language Processing Workshop, pages 18–24, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Semantic Similarity of Arabic Sentences with Word Embeddings (Nagoudi & Schwab, WANLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1303.pdf