2024
pdf
bib
abs
HalluSafe at SemEval-2024 Task 6: An NLI-based Approach to Make LLMs Safer by Better Detecting Hallucinations and Overgeneration Mistakes
Zahra Rahimi
|
Hamidreza Amirzadeh
|
Alireza Sohrabi
|
Zeinab Taghavi
|
Hossein Sameti
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
The advancement of large language models (LLMs), their ability to produce eloquent and fluent content, and their vast knowledge have resulted in their usage in various tasks and applications. Despite generating fluent content, this content can contain fabricated or false information. This problem is known as hallucination and has reduced the confidence in the output of LLMs. In this work, we have used Natural Language Inference to train classifiers for hallucination detection to tackle SemEval-2024 Task 6-SHROOM (Mickus et al., 2024) which is defined in three sub-tasks: Paraphrase Generation, Machine Translation, and Definition Modeling. We have also conducted experiments on LLMs to evaluate their ability to detect hallucinated outputs. We have achieved 75.93% and 78.33% accuracy for the modelaware and model-agnostic tracks, respectively. The shared links of our models and the codes are available on GitHub.
pdf
bib
abs
NIMZ at SemEval-2024 Task 9: Evaluating Methods in Solving Brainteasers Defying Commonsense
Zahra Rahimi
|
Mohammad Moein Shirzady
|
Zeinab Taghavi
|
Hossein Sameti
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
The goal and dream of the artificial intelligence field have long been the development of intelligent systems or agents that mimic human behavior and thinking. Creativity is an essential trait in humans that is closely related to lateral thinking. The remarkable advancements in Language Models have led to extensive research on question-answering and explicit and implicit reasoning involving vertical thinking. However, there is an increasing need to shift focus towards research and development of models that can think laterally. One must step outside the traditional frame of commonsense concepts in lateral thinking to conclude. Task 9 of SemEval-2024 is Brainteaser (Jiang et al.,2024), which requires lateral thinking to answer riddle-like multiple-choice questions. In our study, we assessed the performance of various models for the Brainteaser task. We achieved an overall accuracy of 75% for the Sentence Puzzle subtask and 66.7% for the Word Puzzle subtask. All the codes, along with the links to our saved models, are available on our GitHub.
pdf
bib
abs
Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text
Seyedeh Fatemeh Ebrahimi
|
Karim Akhavan Azari
|
Amirmasoud Iravani
|
Arian Qazvini
|
Pouya Sadeghi
|
Zeinab Taghavi
|
Hossein Sameti
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
In this paper, we delve into the realm of detecting machine-generated text (MGT) within Natural Language Processing (NLP). Our approach involves fine-tuning a RoBERTa-base Transformer, a robust neural architecture, to tackle MGT detection as a binary classification task. Specifically focusing on Subtask A (Monolingual - English) within the SemEval-2024 competition framework, our system achieves a 78.9% accuracy on the test dataset, placing us 57th among participants. While our system demonstrates proficiency in identifying human-written texts, it faces challenges in accurately discerning MGTs.
pdf
bib
abs
Sharif-STR at SemEval-2024 Task 1: Transformer as a Regression Model for Fine-Grained Scoring of Textual Semantic Relations
Seyedeh Fatemeh Ebrahimi
|
Karim Akhavan Azari
|
Amirmasoud Iravani
|
Hadi Alizadeh
|
Zeinab Taghavi
|
Hossein Sameti
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
This paper explores semantic textual relatedness (STR) using fine-tuning techniques on the RoBERTa transformer model, focusing on sentence-level STR within Track A (Supervised). The study evaluates the effectiveness of this approach across different languages, with promising results in English and Spanish but encountering challenges in Arabic.
2023
pdf
bib
abs
Ebhaam at SemEval-2023 Task 1: A CLIP-Based Approach for Comparing Cross-modality and Unimodality in Visual Word Sense Disambiguation
Zeinab Taghavi
|
Parsa Haghighi Naeini
|
Mohammad Ali Sadraei Javaheri
|
Soroush Gooran
|
Ehsaneddin Asgari
|
Hamid Reza Rabiee
|
Hossein Sameti
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on “Visual Word Sense Disambiguation (Visual-WSD)” using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including “Simple prompt-based” methods and “Generated prompt-based” methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75\% in our best attempt.