Morphology-rich Alphasyllabary Embeddings

Amanuel Mersha, Stephen Wu


Abstract
Word embeddings have been successfully trained in many languages. However, both intrinsic and extrinsic metrics are variable across languages, especially for languages that depart significantly from English in morphology and orthography. This study focuses on building a word embedding model suitable for the Semitic language of Amharic (Ethiopia), which is both morphologically rich and written as an alphasyllabary (abugida) rather than an alphabet. We compare embeddings from tailored neural models, simple pre-processing steps, off-the-shelf baselines, and parallel tasks on a better-resourced Semitic language – Arabic. Experiments show our model’s performance on word analogy tasks, illustrating the divergent objectives of morphological vs. semantic analogies.
Anthology ID:
2020.lrec-1.315
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2590–2595
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.315
DOI:
Bibkey:
Cite (ACL):
Amanuel Mersha and Stephen Wu. 2020. Morphology-rich Alphasyllabary Embeddings. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2590–2595, Marseille, France. European Language Resources Association.
Cite (Informal):
Morphology-rich Alphasyllabary Embeddings (Mersha & Wu, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.315.pdf