Hamid Beigy


2024

pdf bib
Consistency Training by Synthetic Question Generation for Conversational Question Answering
Hamed Hematian Hemati | Hamid Beigy
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Efficiently modeling historical information is a critical component in addressing user queries within a conversational question-answering (QA) context, as historical context plays a vital role in clarifying the user’s questions. However, irrelevant history induces noise in the reasoning process, especially for those questions with a considerable historical context. In our novel model-agnostic approach, referred to as **CoTaH** (**Co**nsistency-**T**rained **a**ugmented **H**istory), we augment the historical information with synthetic questions and subsequently employ consistency training to train a model that utilizes both real and augmented historical data to implicitly make the reasoning robust to irrelevant history. To the best of our knowledge, this is the first instance of research using synthetic question generation as a form of data augmentation to model conversational QA settings. By citing a common modeling error prevalent in previous research, we introduce a new baseline and compare our model’s performance against it, demonstrating an improvement in results, particularly in later turns of the conversation, when dealing with questions that include a large historical context.

pdf bib
Zero-Shot Learning and Key Points Are All You Need for Automated Fact-Checking
Mohammad Ghiasvand Mohammadkhani | Ali Ghiasvand Mohammadkhani | Hamid Beigy
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)

Automated fact-checking is an important task because determining the accurate status of a proposed claim within the vast amount of information available online is a critical challenge. This challenge requires robust evaluation to prevent the spread of false information. Modern large language models (LLMs) have demonstrated high capability in performing a diverse range of Natural Language Processing (NLP) tasks. By utilizing proper prompting strategies, their versatility—due to their understanding of large context sizes and zero-shot learning ability—enables them to simulate human problem-solving intuition and move towards being an alternative to humans for solving problems. In this work, we introduce a straightforward framework based on _**Z**ero-**S**hot **L**earning_ and _**Ke**y **P**oints_ (ZSL-KeP) for automated fact-checking, which despite its simplicity, performed well on the AVeriTeC shared task dataset by robustly improving the baseline and achieving 10th place.

2023

pdf bib
Borderless Azerbaijani Processing: Linguistic Resources and a Transformer-based Approach for Azerbaijani Transliteration
Reihaneh Zohrabi | Mostafa Masumi | Omid Ghahroodi | Parham AbedAzad | Hamid Beigy | Mohammad Hossein Rohban | Ehsaneddin Asgari
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
SUTNLP at SemEval-2023 Task 4: LG-Transformer for Human Value Detection
Hamed Hematian Hemati | Sayed Hesam Alavian | Hossein Sameti | Hamid Beigy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

When we interact with other humans, humanvalues guide us to consider the human element. As we shall see, value analysis in NLP hasbeen applied to personality profiling but not toargument mining. As part of SemEval-2023Shared Task 4, our system paper describes amulti-label classifier for identifying human val-ues. Human value detection requires multi-label classification since each argument maycontain multiple values. In this paper, we pro-pose an architecture called Label Graph Trans-former (LG-Transformer). LG-Transformeris a two-stage pipeline consisting of a trans-former jointly encoding argument and labelsand a graph module encoding and obtainingfurther interactions between labels. Using ad-versarial training, we can boost performanceeven further. Our best method scored 50.00 us-ing F1 score on the test set, which is 7.8 higherthan the best baseline method. Our code ispublicly available on Github.

pdf bib
SUTNLP at SemEval-2023 Task 10: RLAT-Transformer for explainable online sexism detection
Hamed Hematian Hemati | Sayed Hesam Alavian | Hamid Beigy | Hossein Sameti
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

There is no simple definition of sexism, butit can be described as prejudice, stereotyping,or discrimination, especially against women,based on their gender. In online interactions,sexism is common. One out of ten Americanadults says that they have been harassed be-cause of their gender and have been the targetof sexism, so sexism is a growing issue. TheExplainable Detection of Online Sexism sharedtask in SemEval-2023 aims at building sexismdetection systems for the English language. Inorder to address the problem, we use largelanguage models such as RoBERTa and De-BERTa. In addition, we present Random LayerAdversarial Training (RLAT) for transformers,and show its significant impact on solving allsubtasks. Moreover, we use virtual adversar-ial training and contrastive learning to improveperformance on subtask A. Upon completionof subtask A, B, and C test sets, we obtainedmacro-F1 of 84.45, 67.78, and 52.52, respec-tively outperforming proposed baselines on allsubtasks. Our code is publicly available onGithub.

2022

pdf bib
Persian Natural Language Inference: A Meta-learning Approach
Heydar Soudani | Mohammad Hassan Mojab | Hamid Beigy
Proceedings of the 29th International Conference on Computational Linguistics

Incorporating information from other languages can improve the results of tasks in low-resource languages. A powerful method of building functional natural language processing systems for low-resource languages is to combine multilingual pre-trained representations with cross-lingual transfer learning. In general, however, shared representations are learned separately, either across tasks or across languages. This paper proposes a meta-learning approach for inferring natural language in Persian. Alternately, meta-learning uses different task information (such as QA in Persian) or other language information (such as natural language inference in English). Also, we investigate the role of task augmentation strategy for forming additional high-quality tasks. We evaluate the proposed method using four languages and an auxiliary task. Compared to the baseline approach, the proposed model consistently outperforms it, improving accuracy by roughly six percent. We also examine the effect of finding appropriate initial parameters using zero-shot evaluation and CCA similarity.

2021

pdf bib
ParsTwiNER: A Corpus for Named Entity Recognition at Informal Persian
MohammadMahdi Aghajani | AliAkbar Badri | Hamid Beigy
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

As a result of unstructured sentences and some misspellings and errors, finding named entities in a noisy environment such as social media takes much more effort. ParsTwiNER contains about 250k tokens, based on standard instructions like MUC-6 or CoNLL 2003, gathered from Persian Twitter. Using Cohen’s Kappa coefficient, the consistency of annotators is 0.95, a high score. In this study, we demonstrate that some state-of-the-art models degrade on these corpora, and trained a new model using parallel transfer learning based on the BERT architecture. Experimental results show that the model works well in informal Persian as well as in formal Persian.