Neural Models for Detecting Binary Semantic Textual Similarity for Algerian and MSA

Wafia Adouane, Jean-Philippe Bernardy, Simon Dobnik


Abstract
We explore the extent to which neural networks can learn to identify semantically equivalent sentences from a small variable dataset using an end-to-end training. We collect a new noisy non-standardised user-generated Algerian (ALG) dataset and also translate it to Modern Standard Arabic (MSA) which serves as its regularised counterpart. We compare the performance of various models on both datasets and report the best performing configurations. The results show that relatively simple models composed of 2 LSTM layers outperform by far other more sophisticated attention-based architectures, for both ALG and MSA datasets.
Anthology ID:
W19-4609
Volume:
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
78–87
Language:
URL:
https://aclanthology.org/W19-4609
DOI:
10.18653/v1/W19-4609
Bibkey:
Cite (ACL):
Wafia Adouane, Jean-Philippe Bernardy, and Simon Dobnik. 2019. Neural Models for Detecting Binary Semantic Textual Similarity for Algerian and MSA. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 78–87, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Neural Models for Detecting Binary Semantic Textual Similarity for Algerian and MSA (Adouane et al., WANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4609.pdf