Ibrahim Gashaw
2021
MUCS@ - Machine Translation for Dravidian Languages using Stacked Long Short Term Memory
Asha Hegde
|
Ibrahim Gashaw
|
Shashirekha H.l.
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Dravidian language family is one of the largest language families in the world. In spite of its uniqueness, Dravidian languages have gained very less attention due to scarcity of resources to conduct language technology tasks such as translation, Parts-of-Speech tagging, Word Sense Disambiguation etc,. In this paper, we, team MUCS, describe sequence-to-sequence stacked Long Short Term Memory (LSTM) based Neural Machine Translation (NMT) model submitted to “Machine Translation in Dravidian languages”, a shared task organized by EACL-2021. The NMT model was applied on translation using English-Tamil, EnglishTelugu, English-Malayalam and Tamil-Telugu corpora provided by the organizers. Standard evaluation metrics namely Bilingual Evaluation Understudy (BLEU) and human evaluations are used to evaluate the model. Our models exhibited good accuracy for all the language pairs and obtained 2nd rank for TamilTelugu language pair.
2019
Language Modelling with NMT Query Translation for Amharic-Arabic Cross-Language Information Retrieval
Ibrahim Gashaw
|
H.l Shashirekha
Proceedings of the 16th International Conference on Natural Language Processing
This paper describes our first experiment on Neural Machine Translation (NMT) based query translation for Amharic-Arabic Cross-Language Information Retrieval (CLIR) task to retrieve relevant documents from Amharic and Arabic text collections in response to a query expressed in the Amharic language. We used a pre-trained NMT model to map a query in the source language into an equivalent query in the target language. The relevant documents are then retrieved using a Language Modeling (LM) based retrieval algorithm. Experiments are conducted on four conventional IR models, namely Uni-gram and Bi-gram LM, Probabilistic model, and Vector Space Model (VSM). The results obtained illustrate that the proposed Uni-gram LM outperforms all other models for both Amharic and Arabic language document collections.
2018
Machine Learning Approaches for Amharic Parts-of-speech Tagging
Ibrahim Gashaw
|
H. L. Shashirekha
Proceedings of the 15th International Conference on Natural Language Processing