Thomas Law
2025
Can LLMs Verify Arabic Claims? Evaluating the Arabic Fact-Checking Abilities of Multilingual LLMs
Ayushman Gupta
|
Aryan Singhal
|
Thomas Law
|
Veekshith Rao
|
Evan Duan
|
Ryan Luo Li
Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script
Large language models (LLMs) have demonstrated potential in fact-checking claims, yet their capabilities in verifying claims in multilingual contexts remain largely understudied. This paper investigates the efficacy of various prompting techniques, viz. Zero-Shot, English Chain-of-Thought, Self-Consistency, and Cross-Lingual Prompting, in enhancing the fact-checking and claim-verification abilities of LLMs for Arabic claims. We utilize 771 Arabic claims sourced from the X-fact dataset to benchmark the performance of four LLMs. To the best of our knowledge, ours is the first study to benchmark the inherent Arabic fact-checking abilities of LLMs stemming from their knowledge of Arabic facts, using a variety of prompting methods. Our results reveal significant variations in accuracy across different prompting methods. Our findings suggest that Cross-Lingual Prompting outperforms other methods, leading to notable performance gains.
2024
Multilingual Fact-Checking using LLMs
Aryan Singhal
|
Thomas Law
|
Coby Kassner
|
Ayushman Gupta
|
Evan Duan
|
Aviral Damle
|
Ryan Luo Li
Proceedings of the Third Workshop on NLP for Positive Impact
Due to the recent rise in digital misinformation, there has been great interest shown in using LLMs for fact-checking and claim verification. In this paper, we answer the question: Do LLMs know multilingual facts and can they use this knowledge for effective fact-checking? To this end, we create a benchmark by filtering multilingual claims from the X-fact dataset and evaluating the multilingual fact-checking capabilities of five LLMs across five diverse languages: Spanish, Italian, Portuguese, Turkish, and Tamil on our benchmark. We employ three different prompting techniques: Zero-Shot, English Chain-of-Thought, and Cross-Lingual Prompting, using both greedy and self-consistency decoding. We extensively analyze our results and find that GPT-4o achieves the highest accuracy, but zero-shot prompting with self-consistency was the most effective overall. We also show that techniques like Chain-of-Thought and Cross-Lingual Prompting, which are designed to improve reasoning abilities, do not necessarily improve the fact-checking abilities of LLMs. Interestingly, we find a strong negative correlation between model accuracy and the amount of internet content for a given language. This suggests that LLMs are better at fact-checking from knowledge in low-resource languages. We hope that this study will encourage more work on multilingual fact-checking using LLMs.
Search
Fix data
Co-authors
- Evan Duan 2
- Ayushman Gupta 2
- Ryan Luo Li 2
- Aryan Singhal 2
- Aviral Damle 1
- show all...