Ahmed Zahran

2024

FactAlign: Fact-Level Hallucination Detection and Classification Through Knowledge Graph Alignment
Mohamed Rashad | Ahmed Zahran | Abanoub Amin | Amr Abdelaal | Mohamed Altantawy
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)

This paper proposes a novel black-box approach for fact-level hallucination detection and classification by transforming the problem into a knowledge graph alignment task. This approach allows us to classify detected hallucinations as either intrinsic or extrinsic. The paper starts by discussing the field of hallucination detection and introducing several approaches to related work. Then, we introduce the proposed FactAlign approach for hallucination detection and discuss how we can use it to classify hallucinations as either intrinsic or extrinsic. Experiments are carried out to evaluate the proposed method against state-of-the-art methods on the hallucination detection task using the WikiBio GPT-3 hallucination dataset, and on the hallucination type classification task using the XSum hallucination annotations dataset. The experimental results show that our method achieves a 0.889 F1 score for the hallucination detection and 0.825 F1 for the hallucination type classification, without any further training, fine-tuning, or producing multiple samples of the LLM response.

2022

pdf bib abs

SNLP at TextGraphs 2022 Shared Task: Unsupervised Natural Language Premise Selection in Mathematical Texts Using Sentence-MPNet
Paul Trust | Provia Kadusabe | Haseeb Younis | Rosane Minghim | Evangelos Milios | Ahmed Zahran
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing

This paper describes our system for the submission to the TextGraphs 2022 shared task at COLING 2022: Natural Language Premise Selection (NLPS) from mathematical texts. The task of NLPS is about selecting mathematical statements called premises in a knowledge base written in natural language and mathematical formulae that are most likely to be used to prove a particular mathematical proof. We formulated this task as an unsupervised semantic similarity task by first obtaining contextualized embeddings of both the premises and mathematical proofs using sentence transformers. We then obtained the cosine similarity between the embeddings of premises and proofs and then selected premises with the highest cosine scores as the most probable. Our system improves over the baseline system that uses bag of words models based on term frequency inverse document frequency in terms of mean average precision (MAP) by about 23.5% (0.1516 versus 0.1228).

pdf bib abs

UCCNLP@SMM4H’22:Label distribution aware long-tailed learning with post-hoc posterior calibration applied to text classification
Paul Trust | Provia Kadusabe | Ahmed Zahran | Rosane Minghim | Kizito Omala
Proceedings of the Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

The paper describes our submissions for the Social Media Mining for Health (SMM4H) workshop 2022 shared tasks. We participated in 2 tasks: (1) classification of adverse drug events (ADE) mentions in english tweets (Task-1a) and (2) classification of self-reported intimate partner violence (IPV) on twitter (Task 7). We proposed an approach that uses RoBERTa (A Robustly Optimized BERT Pretraining Approach) fine-tuned with a label distribution-aware margin loss function and post-hoc posterior calibration for robust inference against class imbalance. We achieved a 4% and 1 % increase in performance on IPV and ADE respectively when compared with the traditional fine-tuning strategy with unweighted cross-entropy loss.

2019

pdf bib abs

A Character Level Convolutional BiLSTM for Arabic Dialect Identification
Mohamed Elaraby | Ahmed Zahran
Proceedings of the Fourth Arabic Natural Language Processing Workshop

In this paper, we describe CU-RAISA teamcontribution to the 2019Madar shared task2, which focused on Twitter User fine-grained dialect identification. Among par-ticipating teams, our system ranked the4th(with 61.54%) F1-Macro measure. Our sys-tem is trained using a character level convo-lutional bidirectional long-short-term memorynetwork trained on 2k users’ data. We showthat training on concatenated user tweets asinput is further superior to training on usertweets separately and assign user’s label on themode of user’s tweets’ predictions.

Co-authors

Venues

Fix author