Salaheddin Alzubi

2024

Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation
Tu Vu | Kalpesh Krishna | Salaheddin Alzubi | Chris Tar | Manaal Faruqui | Yun-Hsuan Sung
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

As large language models (LLMs) evolve, evaluating their output reliably becomes increasingly difficult due to the high cost of human evaluation. To address this, we introduce FLAMe, a family of Foundational Large Autorater Models. FLAMe is trained on a diverse set of over 100 quality assessment tasks, incorporating 5M+ human judgments curated from publicly released human evaluations. FLAMe outperforms models like GPT-4 and Claude-3 on various held-out tasks, and serves as a powerful starting point for fine-tuning, as shown in our reward model evaluation case study (FLAMe-RM). On Reward-Bench, FLAMe-RM-24B achieves 87.8% accuracy, surpassing GPT-4-0125 (85.9%) and GPT-4o (84.7%). Additionally, we introduce FLAMe-Opt-RM, an efficient tail-patch fine-tuning approach that offers competitive RewardBench performance using 25×fewer training datapoints. Our FLAMe variants outperform popular proprietary LLM-as-a-Judge models on 8 of 12 autorater benchmarks, covering 53 quality assessment tasks, including RewardBench and LLM-AggreFact. Finally, our analysis shows that FLAMe is significantly less biased than other LLM-as-a-Judge models on the CoBBLEr autorater bias benchmark.

2022

pdf bib abs

aiXplain at Arabic Hate Speech 2022: An Ensemble Based Approach to Detecting Offensive Tweets
Salaheddin Alzubi | Thiago Castro Ferreira | Lucas Pavanelli | Mohamed Al-Badrashiny
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

Abusive speech on online platforms has a detrimental effect on users’ mental health. This warrants the need for innovative solutions that automatically moderate content, especially on online platforms such as Twitter where a user’s anonymity is loosely controlled. This paper outlines aiXplain Inc.’s ensemble based approach to detecting offensive speech in the Arabic language based on OSACT5’s shared sub-task A. Additionally, this paper highlights multiple challenges that may hinder progress on detecting abusive speech and provides potential avenues and techniques that may lead to significant progress.

Co-authors

Yun-Hsuan Sung 1

Chris Tar 1

Tu Vu 1

Venues

EMNLP1
OSACT1

Fix author