BrainLlama at SemEval-2024 Task 6: Prompting Llama to detect hallucinations and related observable overgeneration mistakes

Marco Siino

doi:10.18653/v1/2024.semeval-1.14

BrainLlama at SemEval-2024 Task 6: Prompting Llama to detect hallucinations and related observable overgeneration mistakes

Abstract

Participants in the SemEval-2024 Task 6 were tasked with executing binary classification aimed at discerning instances of fluent overgeneration hallucinations across two distinct setups: the model-aware and model-agnostic tracks. That is, participants must detect grammatically sound outputs which contain incorrect or unsupported semantic information, regardless of whether they had access to the model responsible for producing the output or not, within the model-aware and model-agnostic tracks. Two tracks were proposed for the task: a model-aware track, where organizers provided a checkpoint to a model publicly available on HuggingFace for every data point considered, and a model-agnostic track where the organizers do not. In this paper, we discuss the application of a Llama model to address both the tracks. Find the persuasive strategy that a meme employs from a hierarchy of twenty based just on its “textual content.” Only a portion of the reward is awarded if the technique’s ancestor node is chosen. This classification issue is multilabel hierarchical. Our approach reaches an accuracy of 0.62 on the agnostic track and of 0.67 on the aware track.

Anthology ID:: 2024.semeval-1.14
Volume:: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 82–87
Language:
URL:: https://aclanthology.org/2024.semeval-1.14/
DOI:: 10.18653/v1/2024.semeval-1.14
Bibkey:
Cite (ACL):: Marco Siino. 2024. BrainLlama at SemEval-2024 Task 6: Prompting Llama to detect hallucinations and related observable overgeneration mistakes. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 82–87, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: BrainLlama at SemEval-2024 Task 6: Prompting Llama to detect hallucinations and related observable overgeneration mistakes (Siino, SemEval 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.semeval-1.14.pdf
Supplementarymaterial:: 2024.semeval-1.14.SupplementaryMaterial.txt
Supplementarymaterial:: 2024.semeval-1.14.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Supplementarymaterial Fix data