Ashraf Hatim Elneima

2026

Agentic AI for Human Resources: LLM-Driven Candidate Assessment
Kamer Ali Yuksel | Abdul Basit Anees | Ashraf Hatim Elneima | Sanjika Hewavitharana | Mohamed Al-Badrashiny | Hassan Sawaf
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)

In this work, we present a modular and interpretable framework that uses Large Language Models (LLMs) to automate candidate assessment in recruitment. The system integrates diverse sources—including job descriptions, CVs, interview transcripts, and HR feedback—to generate structured evaluation reports that mirror expert judgment. Unlike traditional ATS tools that rely on keyword matching or shallow scoring, our approach employs role-specific, LLM-generated rubrics and a multi-agent architecture to perform fine-grained, criteria-driven evaluations. The framework outputs detailed assessment reports, candidate comparisons, and ranked recommendations that are transparent, auditable, and suitable for real-world hiring workflows. Beyond rubric-based analysis, we introduce an LLM-Driven Active Listwise Tournament mechanism for candidate ranking. Instead of noisy pairwise comparisons or inconsistent independent scoring, the LLM ranks small candidate subsets (“mini-tournaments”), and these listwise permutations are aggregated using a Plackett–Luce model. An active-learning loop selects the most informative subsets, producing globally coherent and sample-efficient rankings. This adaptation of listwise LLM preference modeling—previously explored in financial asset ranking —provides a principled and highly interpretable methodology for large-scale candidate ranking in talent acquisition.

pdf bib abs

Media-to-Insights: A Multi-Agent AI System for Continuous Media Monitoring, Analysis, and Reporting
Ashraf Hatim Elneima | Ozan Yilmaz | Hadi Nasrallah | Sanjika Hewavitharana | Mohamed Al-Badrashiny | Hassan Sawaf
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Continuous monitoring of high-volume media streams requires systems that go beyond keyword alerts to deliver structured, actionable intelligence. We present a multi-agent media monitoring system that processes streaming articles through three stages: (1) a Matching Agent that uses a hybrid keyword-then-semantic matching approach, reducing agent invocations by 2̃0% (2) a batched multi-agent feature extraction, reducing core feature-extraction calls from 7 to 2 per article - a 71% reduction - with bounded quality tradeoffs; and (3) a Report Generation Agent that uses deterministic deduplication and density-based clustering. Four autonomous life-cycle agents manage the evolution of watchers.

2024

pdf bib abs

LLM-based MT Data Creation: Dialectal to MSA Translation Shared Task
AhmedElmogtaba Abdelmoniem Ali Abdelaziz | Ashraf Hatim Elneima | Kareem Darwish
Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024

This paper presents our approach to the Dialect to Modern Standard Arabic (MSA) Machine Translation shared task, conducted as part of the sixth Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT6). Our primary contribution is the development of a novel dataset derived from The Saudi Audio Dataset for Arabic (SADA) an Arabic audio corpus. By employing an automated method utilizing ChatGPT 3.5, we translated the dialectal Arabic texts to their MSA equivalents. This process not only yielded a unique and valuable dataset but also showcased an efficient method for leveraging language models in dataset generation. Utilizing this dataset, alongside additional resources, we trained a machine translation model based on the Transformer architecture. Through systematic experimentation with model configurations, we achieved notable improvements in translation quality. Our findings highlight the significance of LLM-assisted dataset creation methodologies and their impact on advancing machine translation systems, particularly for languages with considerable dialectal diversity like Arabic.

pdf bib abs

OSACT6 Dialect to MSA Translation Shared Task Overview
Ashraf Hatim Elneima | AhmedElmogtaba Abdelmoniem Ali Abdelaziz | Kareem Darwish
Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024

This paper presents the Dialectal Arabic (DA) to Modern Standard Arabic (MSA) Machine Translation (MT) shared task in the sixth Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT6). The paper describes the creation of the validation and test data and the metrics used; and provides a brief overview of the submissions to the shared task. In all, 29 teams signed up and 6 teams made actual submissions. The teams used a variety of datasets and approaches to build their MT systems. The most successful submission involved using zero-shot and n-shot prompting of chatGPT.

pdf bib abs

Arabic Diacritization Using Morphologically Informed Character-Level Model
Muhammad Morsy Elmallah | Mahmoud Reda | Kareem Darwish | Abdelrahman El-Sheikh | Ashraf Hatim Elneima | Murtadha Aljubran | Nouf Alsaeed | Reem Mohammed | Mohamed Al-Badrashiny
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Arabic diacritic recovery i.e. diacritization is necessary for proper vocalization and an enabler for downstream applications such as language learning and text to speech. Diacritics come in two varieties, namely: core-word diacritics and case endings. In this paper we introduce a highly effective morphologically informed character-level model that can recover both types of diacritics simultaneously. The model uses a Recurrent Neural Network (RNN) based architecture that takes in text as a sequence of characters, with markers for morphological segmentation, and outputs a sequence of diacritics. We also introduce a character-based morphological segmentation model that we train for Modern Standard Arabic (MSA) and dialectal Arabic. We demonstrate the efficacy of our diacritization model on Classical Arabic, MSA, and two dialectal (Moroccan and Tunisian) texts. We achieve the lowest reported word-level diacritization error rate for MSA (3.4%), match the best results for Classical Arabic (5.4%), and report competitive results for dialectal Arabic.

2022

Arabic diacritic recovery is important for a variety of downstream tasks such as text-to-speech. In this paper, we introduce a new Gulf Arabic diacritization dataset composed of 19,850 words based on a subset of the Gumar corpus. We provide comprehensive set of guidelines for diacritization to enable the diacritization of more data. We also report on diacritization results based on the new corpus using a Hidden Markov Model and character-based sequence to sequence models.