Hossein Sameti

Also published as: H. Sameti

2024

pdf bib abs
HalluSafe at SemEval-2024 Task 6: An NLI-based Approach to Make LLMs Safer by Better Detecting Hallucinations and Overgeneration Mistakes
Zahra Rahimi | Hamidreza Amirzadeh | Alireza Sohrabi | Zeinab Taghavi | Hossein Sameti
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

The advancement of large language models (LLMs), their ability to produce eloquent and fluent content, and their vast knowledge have resulted in their usage in various tasks and applications. Despite generating fluent content, this content can contain fabricated or false information. This problem is known as hallucination and has reduced the confidence in the output of LLMs. In this work, we have used Natural Language Inference to train classifiers for hallucination detection to tackle SemEval-2024 Task 6-SHROOM (Mickus et al., 2024) which is defined in three sub-tasks: Paraphrase Generation, Machine Translation, and Definition Modeling. We have also conducted experiments on LLMs to evaluate their ability to detect hallucinated outputs. We have achieved 75.93% and 78.33% accuracy for the modelaware and model-agnostic tracks, respectively. The shared links of our models and the codes are available on GitHub.

pdf bib abs
NIMZ at SemEval-2024 Task 9: Evaluating Methods in Solving Brainteasers Defying Commonsense
Zahra Rahimi | Mohammad Moein Shirzady | Zeinab Taghavi | Hossein Sameti
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

The goal and dream of the artificial intelligence field have long been the development of intelligent systems or agents that mimic human behavior and thinking. Creativity is an essential trait in humans that is closely related to lateral thinking. The remarkable advancements in Language Models have led to extensive research on question-answering and explicit and implicit reasoning involving vertical thinking. However, there is an increasing need to shift focus towards research and development of models that can think laterally. One must step outside the traditional frame of commonsense concepts in lateral thinking to conclude. Task 9 of SemEval-2024 is Brainteaser (Jiang et al.,2024), which requires lateral thinking to answer riddle-like multiple-choice questions. In our study, we assessed the performance of various models for the Brainteaser task. We achieved an overall accuracy of 75% for the Sentence Puzzle subtask and 66.7% for the Word Puzzle subtask. All the codes, along with the links to our saved models, are available on our GitHub.

pdf bib abs
Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text
Seyedeh Fatemeh Ebrahimi | Karim Akhavan Azari | Amirmasoud Iravani | Arian Qazvini | Pouya Sadeghi | Zeinab Taghavi | Hossein Sameti
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper, we delve into the realm of detecting machine-generated text (MGT) within Natural Language Processing (NLP). Our approach involves fine-tuning a RoBERTa-base Transformer, a robust neural architecture, to tackle MGT detection as a binary classification task. Specifically focusing on Subtask A (Monolingual - English) within the SemEval-2024 competition framework, our system achieves a 78.9% accuracy on the test dataset, placing us 57th among participants. While our system demonstrates proficiency in identifying human-written texts, it faces challenges in accurately discerning MGTs.

pdf bib abs
Sharif-STR at SemEval-2024 Task 1: Transformer as a Regression Model for Fine-Grained Scoring of Textual Semantic Relations
Seyedeh Fatemeh Ebrahimi | Karim Akhavan Azari | Amirmasoud Iravani | Hadi Alizadeh | Zeinab Taghavi | Hossein Sameti
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper explores semantic textual relatedness (STR) using fine-tuning techniques on the RoBERTa transformer model, focusing on sentence-level STR within Track A (Supervised). The study evaluates the effectiveness of this approach across different languages, with promising results in English and Spanish but encountering challenges in Arabic.

Language models, particularly generative models, are susceptible to hallucinations, generating outputs that contradict factual knowledgeor the source text. This study explores methodsfor detecting hallucinations in three SemEval2024 Task 6 tasks: Machine Translation, Definition Modeling, and Paraphrase Generation.We evaluate two methods: semantic similaritybetween the generated text and factual references, and an ensemble of language modelsthat judge each other’s outputs. Our resultsshow that semantic similarity achieves moderate accuracy and correlation scores in trial data,while the ensemble method offers insights intothe complexities of hallucination detection butfalls short of expectations. This work highlights the challenges of hallucination detectionand underscores the need for further researchin this critical area.

pdf bib abs
Flatness-Aware Gradient Descent for Safe Conversational AI
Leila Khalatbari | Saeid Hosseini | Hossein Sameti | Pascale Fung
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)

As generative dialog models become ubiquitous in real-world applications, it is paramount to ensure a harmless generation. There are two major challenges when enforcing safety to open-domain chatbots. Firstly, it is impractical to provide training data reflecting the desired response to all emerging forms of toxicity (generalisation challenge). Secondly, implementing safety features may compromise the quality of the conversation (trade-off challenge). To tackle the challenges, this paper introduces a regularized fine-tuning approach called FlatGD. By employing a safety-tailored loss, we translate better optimization to more safety. To ensure better optimization, FlatGD penalizes sharp trajectories of loss curve, encouraging flatness of the converged local minima. Experimental results on datasets of “BAD” and “prosocial dialog” demonstrate that our model outperforms the current baselines in reducing toxicity while preserving the conversation quality. Moreover, compared to other baselines, FlatGD can better generalize to unseen toxic data.

2023

pdf bib abs
SUTNLP at SemEval-2023 Task 4: LG-Transformer for Human Value Detection
Hamed Hematian Hemati | Sayed Hesam Alavian | Hossein Sameti | Hamid Beigy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

When we interact with other humans, humanvalues guide us to consider the human element. As we shall see, value analysis in NLP hasbeen applied to personality profiling but not toargument mining. As part of SemEval-2023Shared Task 4, our system paper describes amulti-label classifier for identifying human val-ues. Human value detection requires multi-label classification since each argument maycontain multiple values. In this paper, we pro-pose an architecture called Label Graph Trans-former (LG-Transformer). LG-Transformeris a two-stage pipeline consisting of a trans-former jointly encoding argument and labelsand a graph module encoding and obtainingfurther interactions between labels. Using ad-versarial training, we can boost performanceeven further. Our best method scored 50.00 us-ing F1 score on the test set, which is 7.8 higherthan the best baseline method. Our code ispublicly available on Github.

pdf bib abs
SUTNLP at SemEval-2023 Task 10: RLAT-Transformer for explainable online sexism detection
Hamed Hematian Hemati | Sayed Hesam Alavian | Hamid Beigy | Hossein Sameti
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

There is no simple definition of sexism, butit can be described as prejudice, stereotyping,or discrimination, especially against women,based on their gender. In online interactions,sexism is common. One out of ten Americanadults says that they have been harassed be-cause of their gender and have been the targetof sexism, so sexism is a growing issue. TheExplainable Detection of Online Sexism sharedtask in SemEval-2023 aims at building sexismdetection systems for the English language. Inorder to address the problem, we use largelanguage models such as RoBERTa and De-BERTa. In addition, we present Random LayerAdversarial Training (RLAT) for transformers,and show its significant impact on solving allsubtasks. Moreover, we use virtual adversar-ial training and contrastive learning to improveperformance on subtask A. Upon completionof subtask A, B, and C test sets, we obtainedmacro-F1 of 84.45, 67.78, and 52.52, respec-tively outperforming proposed baselines on allsubtasks. Our code is publicly available onGithub.

pdf bib abs
Ebhaam at SemEval-2023 Task 1: A CLIP-Based Approach for Comparing Cross-modality and Unimodality in Visual Word Sense Disambiguation
Zeinab Taghavi | Parsa Haghighi Naeini | Mohammad Ali Sadraei Javaheri | Soroush Gooran | Ehsaneddin Asgari | Hamid Reza Rabiee | Hossein Sameti
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on “Visual Word Sense Disambiguation (Visual-WSD)” using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including “Simple prompt-based” methods and “Generated prompt-based” methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75\% in our best attempt.

2022

pdf bib abs
Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval
Sayed Hesam Alavian | Ali Satvaty | Sadra Sabouri | Ehsaneddin Asgari | Hossein Sameti
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

Information-seeking dialogue systems, including knowledge identification and response generation, aim to respond to users with fluent, coherent, and informative answers based on users’ needs. This paper discusses our proposed approach, Docalog, for the DialDoc-22 (MultiDoc2Dial) shared task. Docalog identifies the most relevant knowledge in the associated document, in a multi-document setting. Docalog, is a three-stage pipeline consisting of (1) a document retriever model (DR. TEIT), (2) an answer span prediction model, and (3) an ultimate span picker deciding on the most likely answer span, out of all predicted spans. In the test phase of MultiDoc2Dial 2022, Docalog achieved f1-scores of 36.07% and 28.44% and SacreBLEU scores of 23.70% and 20.52%, respectively on the MDD-SEEN and MDD-UNSEEN folds.

2019

pdf bib abs
Ghmerti at SemEval-2019 Task 6: A Deep Word- and Character-based Approach to Offensive Language Identification
Ehsan Doostmohammadi | Hossein Sameti | Ali Saffar
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper presents the models submitted by Ghmerti team for subtasks A and B of the OffensEval shared task at SemEval 2019. OffensEval addresses the problem of identifying and categorizing offensive language in social media in three subtasks; whether or not a content is offensive (subtask A), whether it is targeted (subtask B) towards an individual, a group, or other entities (subtask C). The proposed approach includes character-level Convolutional Neural Network, word-level Recurrent Neural Network, and some preprocessing. The performance achieved by the proposed model is 77.93% macro-averaged F1-score.

In this paper building statistical language models for Persian language using a corpus and incorporating them in Persian continuous speech recognition (CSR) system are described. We used Persian Text Corpus for building the language models. First we preprocessed the texts of corpus by correcting the different orthography of words. Also, the number of POS tags was decreased by clustering POS tags manually. Then we extracted word based monogram and POS-based bigram and trigram language models from the corpus. We also present the procedure of incorporating language models in a Persian CSR system. By using the language models 27.4% reduction in word error rate was achieved in the best case.

Venues

ws1

dialdoc1

Hossein Sameti

2024

2023

2022

2019

2017

2014

2006

Co-authors

Venues