ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model

Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, Didier Schwab


Abstract
Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (IR) and Information Extraction (IE) are among such areas. In this paper, we propose an open source ArbEngVec which provides several Arabic-English cross-lingual word embedding models. To train our bilingual models, we use a large dataset with more than 93 million pairs of Arabic-English parallel sentences. In addition, we perform both extrinsic and intrinsic evaluations for the different word embedding model variants. The extrinsic evaluation assesses the performance of models on the cross-language Semantic Textual Similarity (STS), while the intrinsic evaluation is based on the Word Translation (WT) task.
Anthology ID:
W19-4605
Volume:
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
40–48
Language:
URL:
https://aclanthology.org/W19-4605
DOI:
10.18653/v1/W19-4605
Bibkey:
Cite (ACL):
Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, and Didier Schwab. 2019. ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 40–48, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model (Lachraf et al., WANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4605.pdf