Language Modelling with NMT Query Translation for Amharic-Arabic Cross-Language Information Retrieval

Ibrahim Gashaw, H.l Shashirekha


Abstract
This paper describes our first experiment on Neural Machine Translation (NMT) based query translation for Amharic-Arabic Cross-Language Information Retrieval (CLIR) task to retrieve relevant documents from Amharic and Arabic text collections in response to a query expressed in the Amharic language. We used a pre-trained NMT model to map a query in the source language into an equivalent query in the target language. The relevant documents are then retrieved using a Language Modeling (LM) based retrieval algorithm. Experiments are conducted on four conventional IR models, namely Uni-gram and Bi-gram LM, Probabilistic model, and Vector Space Model (VSM). The results obtained illustrate that the proposed Uni-gram LM outperforms all other models for both Amharic and Arabic language document collections.
Anthology ID:
2019.icon-1.7
Volume:
Proceedings of the 16th International Conference on Natural Language Processing
Month:
December
Year:
2019
Address:
International Institute of Information Technology, Hyderabad, India
Venue:
ICON
SIG:
Publisher:
NLP Association of India
Note:
Pages:
56–64
Language:
URL:
https://aclanthology.org/2019.icon-1.7
DOI:
Bibkey:
Cite (ACL):
Ibrahim Gashaw and H.l Shashirekha. 2019. Language Modelling with NMT Query Translation for Amharic-Arabic Cross-Language Information Retrieval. In Proceedings of the 16th International Conference on Natural Language Processing, pages 56–64, International Institute of Information Technology, Hyderabad, India. NLP Association of India.
Cite (Informal):
Language Modelling with NMT Query Translation for Amharic-Arabic Cross-Language Information Retrieval (Gashaw & Shashirekha, ICON 2019)
Copy Citation:
PDF:
https://aclanthology.org/2019.icon-1.7.pdf