Towards a Computational Lexicon for Moroccan Darija: Words, Idioms, and Constructions

Jamal Laoudi, Claire Bonial, Lucia Donatelli, Stephen Tratz, Clare Voss


Abstract
In this paper, we explore the challenges of building a computational lexicon for Moroccan Darija (MD), an Arabic dialect spoken by over 32 million people worldwide but which only recently has begun appearing frequently in written form in social media. We raise the question of what belongs in such a lexicon and start by describing our work building traditional word-level lexicon entries with their English translations. We then discuss challenges in translating idiomatic MD text that led to creating multi-word expression lexicon entries whose meanings could not be fully derived from the individual words. Finally, we provide a preliminary exploration of constructions to be considered for inclusion in an MD constructicon by translating examples of English constructions and examining their MD counterparts.
Anthology ID:
W18-4910
Volume:
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venue:
LAW
SIGs:
SIGLEX | SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
74–85
Language:
URL:
https://aclanthology.org/W18-4910
DOI:
Bibkey:
Cite (ACL):
Jamal Laoudi, Claire Bonial, Lucia Donatelli, Stephen Tratz, and Clare Voss. 2018. Towards a Computational Lexicon for Moroccan Darija: Words, Idioms, and Constructions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pages 74–85, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Towards a Computational Lexicon for Moroccan Darija: Words, Idioms, and Constructions (Laoudi et al., LAW 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4910.pdf