Identifying Effective Translations for Cross-lingual Arabic-to-English User-generated Speech Search

Ahmad Khwileh, Haithem Afli, Gareth Jones, Andy Way


Abstract
Cross Language Information Retrieval (CLIR) systems are a valuable tool to enable speakers of one language to search for content of interest expressed in a different language. A group for whom this is of particular interest is bilingual Arabic speakers who wish to search for English language content using information needs expressed in Arabic queries. A key challenge in CLIR is crossing the language barrier between the query and the documents. The most common approach to bridging this gap is automated query translation, which can be unreliable for vague or short queries. In this work, we examine the potential for improving CLIR effectiveness by predicting the translation effectiveness using Query Performance Prediction (QPP) techniques. We propose a novel QPP method to estimate the quality of translation for an Arabic-English Cross-lingual User-generated Speech Search (CLUGS) task. We present an empirical evaluation that demonstrates the quality of our method on alternative translation outputs extracted from an Arabic-to-English Machine Translation system developed for this task. Finally, we show how this framework can be integrated in CLUGS to find relevant translations for improved retrieval performance.
Anthology ID:
W17-1313
Volume:
Proceedings of the Third Arabic Natural Language Processing Workshop
Month:
April
Year:
2017
Address:
Valencia, Spain
Venue:
WANLP
SIG:
SEMITIC
Publisher:
Association for Computational Linguistics
Note:
Pages:
100–109
Language:
URL:
https://aclanthology.org/W17-1313
DOI:
10.18653/v1/W17-1313
Bibkey:
Cite (ACL):
Ahmad Khwileh, Haithem Afli, Gareth Jones, and Andy Way. 2017. Identifying Effective Translations for Cross-lingual Arabic-to-English User-generated Speech Search. In Proceedings of the Third Arabic Natural Language Processing Workshop, pages 100–109, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Identifying Effective Translations for Cross-lingual Arabic-to-English User-generated Speech Search (Khwileh et al., WANLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1313.pdf