Ohan Ahmad


2025

pdf bib
CoVeGAT: A Hybrid LLM & Graph‐Attention Pipeline for Accurate Citation‐Aligned Claim Verification
Max Bader | Akshatha Arunkumar | Ohan Ahmad | Maruf Hassen | Charles Duong | Kevin Zhu
Proceedings of the First Workshop on Ethical Concerns in Training, Evaluating and Deploying Large Language Models

Modern LLMs often generate fluent text yet fabricate, misquote, or misattribute evidence. To quantify this flaw, we built a balanced Citation‐Alignment Dataset of 500 genuine, expert‐verified claim–quote pairs and 500 minimally perturbed false variants from news, legal, scientific, and literary sources. We then propose CoVeGAT, which converts claims and citations into SVO triplets (with trigram fallback), scores each pair via an LLM‐driven chain of verification, and embeds them in a weighted semantic graph. A Graph Attention Network over BERT embeddings issues strict pass/fail judgments on alignment. Zero‐shot evaluation of seven top LLMs (e.g., GPT‐4o, Gemini 1.5, Mistral 7B) reveals a trade‐off: decisive models reach 82.5 % accuracy but err confidently, while cautious ones fall below 50 %. A MiniLM + RBF kernel baseline, by contrast, achieves 96.4 % accuracy, underscoring the power of simple, interpretable methods.

pdf bib
CoVeGAT: A Hybrid LLM & Graph‐Attention Pipeline for Accurate Citation‐Aligned Claim Verification
Max Bader | Akshatha Arunkumar | Ohan Ahmad | Maruf Hassen | Charles Duong | Vasu Sharma | Sean O’Brien | Kevin Zhu
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

Modern LLMs often generate fluent text yet fabricate, misquote, or misattribute evidence. To quantify this flaw, we built a balanced Citation‐Alignment Dataset of 500 genuine, expert‐verified claim–quote pairs and 500 minimally perturbed false variants from news, legal, scientific, and literary sources. We then propose CoVeGAT, which converts claims and citations into SVO triplets (with trigram fallback), scores each pair via an LLM‐driven chain of verification, and embeds them in a weighted semantic graph. A Graph Attention Network over BERT embeddings issues strict pass/fail judgments on alignment. Zero‐shot evaluation of seven top LLMs (e.g., GPT‐4o, Gemini 1.5, Mistral 7B) reveals a trade‐off: decisive models reach 82.5 % accuracy but err confidently, while cautious ones fall below 50 %. A MiniLM + RBF kernel baseline, by contrast, achieves 96.4 % accuracy, underscoring the power of simple, interpretable methods.