Hassan Sawaf

2026

Agentic AI for Human Resources: LLM-Driven Candidate Assessment
Kamer Ali Yuksel | Abdul Basit Anees | Ashraf Hatim Elneima | Sanjika Hewavitharana | Mohamed Al-Badrashiny | Hassan Sawaf
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)

In this work, we present a modular and interpretable framework that uses Large Language Models (LLMs) to automate candidate assessment in recruitment. The system integrates diverse sources—including job descriptions, CVs, interview transcripts, and HR feedback—to generate structured evaluation reports that mirror expert judgment. Unlike traditional ATS tools that rely on keyword matching or shallow scoring, our approach employs role-specific, LLM-generated rubrics and a multi-agent architecture to perform fine-grained, criteria-driven evaluations. The framework outputs detailed assessment reports, candidate comparisons, and ranked recommendations that are transparent, auditable, and suitable for real-world hiring workflows. Beyond rubric-based analysis, we introduce an LLM-Driven Active Listwise Tournament mechanism for candidate ranking. Instead of noisy pairwise comparisons or inconsistent independent scoring, the LLM ranks small candidate subsets (“mini-tournaments”), and these listwise permutations are aggregated using a Plackett–Luce model. An active-learning loop selects the most informative subsets, producing globally coherent and sample-efficient rankings. This adaptation of listwise LLM preference modeling—previously explored in financial asset ranking —provides a principled and highly interpretable methodology for large-scale candidate ranking in talent acquisition.

2025

pdf bib abs

Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines
Yunsu Kim | Ahmedelmogtaba Abdelaziz | Thiago Castro Ferreira | Mohamed Al-Badrashiny | Hassan Sawaf
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

As the demand for artificial intelligence (AI) grows to address complex real-world tasks, single models are often insufficient, requiring the integration of multiple models into pipelines. This paper introduces Bel Esprit, a conversational agent designed to construct AI model pipelines based on user requirements. Bel Esprit uses a multi-agent framework where subagents collaborate to clarify requirements, build, validate, and populate pipelines with appropriate models. We demonstrate its effectiveness in generating pipelines from ambiguous user queries, using both human-curated and synthetic data. A detailed error analysis highlights ongoing challenges in pipeline building.Bel Esprit is available for a free trial at https://belesprit.aixplain.com.

pdf bib abs

A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops
Kamer Ali Yuksel | Thiago Castro Ferreira | Mohamed Al-Badrashiny | Hassan Sawaf
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)

Agentic AI systems use specialized agents to handle tasks within complex workflows, enabling automation and efficiency. However, optimizing these systems often requires labor-intensive, manual adjustments to refine roles, tasks, and interactions. This paper introduces a framework for autonomously optimizing Agentic AI solutions across industries, such as NLG-driven enterprise applications. The system employs agents for Refinement, Execution, Evaluation, Modification, and Documentation, leveraging iterative feedback loops powered by an LLM (Llama 3.2-3B). The framework achieves optimal performance without human input by autonomously generating and testing hypotheses to improve system configurations. This approach enhances scalability and adaptability, offering a robust solution for real-world applications in dynamic environments. Case studies across diverse domains illustrate the transformative impact of this framework, showcasing significant improvements in output quality, relevance, and actionability. All data for these case studies, including original and evolved agent codes, along with their outputs, are here: https://anonymous.4open.science/r/evolver-1D11

2024

pdf bib abs

aiXplain SDK: A High-Level and Standardized Toolkit for AI Assets
Shreyas Sharma | Lucas Pavanelli | Thiago Castro Ferreira | Mohamed Al-Badrashiny | Hassan Sawaf
Proceedings of the 17th International Natural Language Generation Conference

The aiXplain SDK is an open-source Python toolkit which aims to simplify the wide and complex ecosystem of AI resources. The toolkit enables access to a wide selection of AI assets, including datasets, models, and metrics, from both academic and commercial sources, which can be selected, executed and evaluated in one place through different services in a standardized format with consistent documentation provided. The study showcases the potential of the proposed toolkit with different code examples and by using it on a user journey where state-of-the-art Large Language Models are fine-tuned on instruction prompt datasets, outperforming their base versions.

2023

pdf bib

pdf bib abs

EvolveMT: an Ensemble MT Engine Improving Itself with Usage Only
Kamer Yüksel | Ahmet Gunduz | Mohamed Al-badrashiny | Hassan Sawaf
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

This work proposes a method named EvolveMT for the efficient combination of multiple machine translation (MT) engines. The method selects the output from one engine for each segment, using online learning techniques to predict the most appropriate system for each translation request. A neural quality estimation metric supervises the method without requiring reference translations. The method’s online learning capability enables it to adapt to changes in the domain or MT engines dynamically, eliminating the requirement for retraining. The method selects a subset of translation engines to be called based on the source sentence features. The degree of exploration is configurable according to the desired quality-cost trade-off. Results from custom datasets demonstrate that EvolveMT achieves similar translation accuracy at a lower cost than selecting the best translation of each segment from all translations using an MT quality estimator. To the best of our knowledge, EvolveMT is the first MT system that adapts itself after deployment to incoming translation requests from the production environment without needing costly retraining on human feedback.

2022

pdf bib abs

Efficient Machine Translation Corpus Generation
Kamer Ali Yuksel | Ahmet Gunduz | Shreyas Sharma | Hassan Sawaf
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Workshop 2: Corpus Generation and Corpus Augmentation for Machine Translation)

This paper proposes an efficient and semi-automated method for human-in-the-loop post- editing for machine translation (MT) corpus generation. The method is based on online training of a custom MT quality estimation metric on-the-fly as linguists perform post-edits. The online estimator is used to prioritize worse hypotheses for post-editing, and auto-close best hypothe- ses without post-editing. This way, significant improvements can be achieved in the resulting quality of post-edits at a lower cost due to reduced human involvement. The trained estimator can also provide an online sanity check mechanism for post-edits and remove the need for ad- ditional linguists to review them or work on the same hypotheses. In this paper, the effect of prioritizing with the proposed method on the resulting MT corpus quality is presented versus scheduling hypotheses randomly. As demonstrated by experiments, the proposed method im- proves the lifecycle of MT models by focusing the linguist effort on production samples and hypotheses, which matter most for expanding MT corpora to be used for re-training them

pdf bib abs

The performance of Machine Translation (MT) systems varies significantly with inputs of diverging features such as topics, genres, and surface properties. Though there are many MT evaluation metrics that generally correlate with human judgments, they are not directly useful in identifying specific shortcomings of MT systems. In this demo, we present a benchmarking interface that enables improved evaluation of specific MT systems in isolation or multiple MT systems collectively by quantitatively evaluating their performance on many tasks across multiple domains and evaluation metrics. Further, it facilitates effective debugging and error analysis of MT output via the use of dynamic filters that help users hone in on problem sentences with specific properties, such as genre, topic, sentence length, etc. The interface can be extended to include additional filters such as lexical, morphological, and syntactic features. Aside from helping debug MT output, it can also help in identifying problems in reference translations and evaluation metrics.

2020

pdf bib abs

We present enhancements to a speech-to-speech translation pipeline in order to perform automatic dubbing. Our architecture features neural machine translation generating output of preferred length, prosodic alignment of the translation with the original speech segments, neural text-to-speech with fine tuning of the duration of each utterance, and, finally, audio rendering to enriches text-to-speech output with background noise and reverberation extracted from the original audio. We report and discuss results of a first subjective evaluation of automatic dubbing of excerpts of TED Talks from English into Italian, which measures the perceived naturalness of automatic dubbing and the relative importance of each proposed enhancement.

We describe a system to rapidly generate high-quality closed captions and subtitles for live broadcasted TV shows, using automated components, namely Automatic Speech Recognition and Machine Translation. The human stays in the loop for quality assurance and optional post-editing. We also describe how the system feeds the human edits and corrections back into the different components for improvement of these components and with that of the overall system. We finally describe the operation of this system in a real life environment within a broadcast network, where we implemented the system to transcribe, process broadcast transmissions and generate high-quality closed captions in Arabic and translate these into English subtitles in short time.

2010

bib abs

User-generated System for Critical Document Triage and Exploitation–Version 2011
Kristen Summers | Hassan Sawaf
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program

CACI has developed and delivered systems for document exploitation and processing to Government customers around the world. Many of these systems include advanced language processing capabilities in order to enable rapid triage of vast collections of foreign language documents, separating the content that requires immediate human attention from the less immediately pressing material. AppTek provides key patent-pending Machine Translation technology for this critical process, rendering material in Arabic, Farsi and other languages into an English rendition that enables both further automated processing and rapid review by monolingual analysts, to identify the documents that require immediate linguist attention. Both CACI and AppTek have been working with customers to develop capabilities that enable them, the users, to be the ones in command of making their systems learn and continuously improve. We will describe how we put this critical user requirement into the systems and the key role that the user's involvement played in this. We will also discuss some of the key components of the system and what the customer-centric evolution of the system will be, including our document translation workflow, the machine translation technology within it, and our approaches to supporting the technology and sustaining its success designed around adapting to user needs.

pdf bib abs

Arabic Dialect Handling in Hybrid Machine Translation
Hassan Sawaf
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

In this paper, we describe an extension to a hybrid machine translation system for handling dialect Arabic, using a decoding algorithm to normalize non-standard, spontaneous and dialectal Arabic into Modern Standard Arabic. We prove the feasibility of the approach by measuring and comparing machine translation results in terms of BLEU with and without the proposed approach. We show in our tests that on real-live broadcast input with transcriptions of dialectal speech we achieve an increase on BLEU of about 1%, and on web content with dialect text of about 2%.

2008

pdf bib abs

Hybrid Machine Translation Applied to Media Monitoring
Hassan Sawaf | Braddock Gaskill | Michael Veronis
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT

In this paper, a system is presented that recognizes spoken utterances in Arabic Dialects which are translated into text in English. The input is recorded from a broadcast channel and recognized using automatic speech recognition that recognize Modern Standard Arabic and Iraqi Colloquial Arabic. The recognized utterances are normalized into Modern Standard Arabic and the output of this Modern Standard Arabic interlingua is then translated by a hybrid machine translation system, combining statistical and rule-based features.

2000

pdf bib abs

On the Use of Grammar Based Language Models for Statistical Machine Translation
Hassan Sawaf | Kai Schütz | Hermann Ney
Proceedings of the Sixth International Workshop on Parsing Technologies

In this paper, we describe some concepts of language models beyond the usually used standard trigram and use such language models for statistical machine translation. In statistical machine translation the language model is the a-priori knowledge source of the system about the target language. One important requirement for the language model is the correct word order, given a certain choice of words, and to score the translations generated by the translation model Pr(f₁^J/e^I₁), in view of the syntactic context. In addition to standard m-grams with long histories, we examine the use of Part-of-Speech based models as well as linguistically motivated grammars with stochastic parsing as a special type of language model. Translation results are given on the VERBMOBIL task, where translation is performed from German to English, with vocabulary sizes of 6500 and 4000 words, respectively.

Venues

Hassan Sawaf

2026

2025

2024

2023

2022

2020

2015

2014

2013

2012

2010

2008

2000

Co-authors

Venues