En-Ar Bilingual Word Embeddings without Word Alignment: Factors Effects

Taghreed Alqaisi, Simon O’Keefe


Abstract
This paper introduces the first attempt to investigate morphological segmentation on En-Ar bilingual word embeddings using bilingual word embeddings model without word alignment (BilBOWA). We investigate the effect of sentence length and embedding size on the learning process. Our experiment shows that using the D3 segmentation scheme improves the accuracy of learning bilingual word embeddings up to 10 percentage points compared to the ATB and D0 schemes in all different training settings.
Anthology ID:
W19-4611
Volume:
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
97–107
Language:
URL:
https://aclanthology.org/W19-4611/
DOI:
10.18653/v1/W19-4611
Bibkey:
Cite (ACL):
Taghreed Alqaisi and Simon O’Keefe. 2019. En-Ar Bilingual Word Embeddings without Word Alignment: Factors Effects. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 97–107, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
En-Ar Bilingual Word Embeddings without Word Alignment: Factors Effects (Alqaisi & O’Keefe, WANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4611.pdf