Evaluating Large Language Models on Health-Related Claims Across Arabic Dialects

Abdulsalam obaid Alharbi; Abdullah Alsuhaibani; Abdulrahman Abdullah Alalawi; Usman Naseem; Shoaib Jameel; Salil Kanhere; Imran Razzak

Evaluating Large Language Models on Health-Related Claims Across Arabic Dialects

Abdulsalam obaid Alharbi, Abdullah Alsuhaibani, Abdulrahman Abdullah Alalawi, Usman Naseem, Shoaib Jameel, Salil Kanhere, Imran Razzak

Abstract

While the Large Language Models (LLMs) have been popular in different tasks, their capability to handle health-related claims in diverse linguistic and cultural contexts, such as Arabic dialects, Saudi, Egyptian, Lebanese, and Moroccan has not been thoroughly explored. To this end, we develop a comprehensive evaluation framework to assess how LLMs particularly GPT-4 respond to health-related claims. Our framework focuses on measuring factual accuracy, consistency, and cultural adaptability. It introduces a new metric, the “Cultural Sensitivity Score”, to evaluate the model’s ability to adjust responses based on dialectal differences. Additionally, the reasoning patterns used by the models are analyzed to assess their effectiveness in engaging with claims across these dialects. Our findings highlight that while LLMs excel in recognizing true claims, they encounter difficulties with mixed and ambiguous claims, especially in underrepresented dialects. This work underscores the importance of dialect-specific evaluations to ensure accurate, contextually appropriate, and culturally sensitive responses from LLMs in real-world applications.

Anthology ID:: 2025.abjadnlp-1.11
Volume:: Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editor:: Mo El-Haj
Venues:: AbjadNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 95–103
Language:
URL:: https://aclanthology.org/2025.abjadnlp-1.11/
DOI:
Bibkey:
Cite (ACL):: Abdulsalam obaid Alharbi, Abdullah Alsuhaibani, Abdulrahman Abdullah Alalawi, Usman Naseem, Shoaib Jameel, Salil Kanhere, and Imran Razzak. 2025. Evaluating Large Language Models on Health-Related Claims Across Arabic Dialects. In Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script, pages 95–103, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Evaluating Large Language Models on Health-Related Claims Across Arabic Dialects (Alharbi et al., AbjadNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.abjadnlp-1.11.pdf

PDF Cite Search Fix data