Behrouz Minaei-Bidgoli

Also published as: Behrouz Minaei, Behrouz Minaei-bidgoli


pdf bib
Pars-ABSA: a Manually Annotated Aspect-based Sentiment Analysis Benchmark on Farsi Product Reviews
Taha Shangipour ataei | Kamyar Darvishi | Soroush Javdan | Behrouz Minaei-Bidgoli | Sauleh Eetemadi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Due to the increased availability of online reviews, sentiment analysis witnessed a thriving interest from researchers. Sentiment analysis is a computational treatment of sentiment used to extract and understand the opinions of authors. While many systems were built to predict the sentiment of a document or a sentence, many others provide the necessary detail on various aspects of the entity (i.e., aspect-based sentiment analysis). Most of the available data resources were tailored to English and the other popular European languages. Although Farsi is a language with more than 110 million speakers, to the best of our knowledge, there is a lack of proper public datasets on aspect-based sentiment analysis for Farsi. This paper provides a manually annotated Farsi dataset, Pars-ABSA, annotated and verified by three native Farsi speakers. The dataset consists of 5,114 positive, 3,061 negative and 1,827 neutral data samples from 5,602 unique reviews. Moreover, as a baseline, this paper reports the performance of some aspect-based sentiment analysis methods focusing on transfer learning on Pars-ABSA.


pdf bib
A Sample-Based Training Method for Distantly Supervised Relation Extraction with Pre-Trained Transformers
Mehrdad Nasser | Mohamad Bagher Sajadi | Behrouz Minaei-Bidgoli
Proceedings of the Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)

pdf bib
ParsFEVER: a Dataset for Farsi Fact Extraction and Verification
Majid Zarharan | Mahsa Ghaderan | Amin Pourdabiri | Zahra Sayedi | Behrouz Minaei-Bidgoli | Sauleh Eetemadi | Mohammad Taher Pilehvar
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

Training and evaluation of automatic fact extraction and verification techniques require large amounts of annotated data which might not be available for low-resource languages. This paper presents ParsFEVER: the first publicly available Farsi dataset for fact extraction and verification. We adopt the construction procedure of the standard English dataset for the task, i.e., FEVER, and improve it for the case of low-resource languages. Specifically, claims are extracted from sentences that are carefully selected to be more informative. The dataset comprises nearly 23K manually-annotated claims. Over 65% of the claims in ParsFEVER are many-hop (require evidence from multiple sources), making the dataset a challenging benchmark (only 13% of the claims in FEVER are many-hop). Also, despite having a smaller training set (around one-ninth of that in Fever), a model trained on ParsFEVER attains similar downstream performance, indicating the quality of the dataset. We release the dataset and the annotation guidelines at


pdf bib
IUST at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text Using Deep Neural Networks and Linear Baselines
Soroush Javdan | Taha Shangipour ataei | Behrouz Minaei-Bidgoli
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Sentiment Analysis is a well-studied field of Natural Language Processing. However, the rapid growth of social media and noisy content within them poses significant challenges in addressing this problem with well-established methods and tools. One of these challenges is code-mixing, which means using different languages to convey thoughts in social media texts. Our group, with the name of IUST(username: TAHA), participated at the SemEval-2020 shared task 9 on Sentiment Analysis for Code-Mixed Social Media Text, and we have attempted to develop a system to predict the sentiment of a given code-mixed tweet. We used different preprocessing techniques and proposed to use different methods that vary from NBSVM to more complicated deep neural network models. Our best performing method obtains an F1 score of 0.751 for the Spanish-English sub-task and 0.706 over the Hindi-English sub-task.

pdf bib
Applying Transformers and Aspect-based Sentiment Analysis approaches on Sarcasm Detection
Taha Shangipour ataei | Soroush Javdan | Behrouz Minaei-Bidgoli
Proceedings of the Second Workshop on Figurative Language Processing

Sarcasm is a type of figurative language broadly adopted in social media and daily conversations. The sarcasm can ultimately alter the meaning of the sentence, which makes the opinion analysis process error-prone. In this paper, we propose to employ bidirectional encoder representations transformers (BERT), and aspect-based sentiment analysis approaches in order to extract the relation between context dialogue sequence and response and determine whether or not the response is sarcastic. The best performing method of ours obtains an F1 score of 0.73 on the Twitter dataset and 0.734 over the Reddit dataset at the second workshop on figurative language processing Shared Task 2020.


pdf bib
An Empirical Study on the Effect of Morphological and Lexical Features in Persian Dependency Parsing
Mojtaba Khallash | Ali Hadian | Behrouz Minaei-Bidgoli
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages


pdf bib
A Framework for Spelling Correction in Persian Language Using Noisy Channel Model
Mohammad Hoseyn Sheykholeslam | Behrouz Minaei-Bidgoli | Hossein Juzi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

There are several methods offered for spelling correction in Farsi (Persian) Language. Unfortunately no powerful framework has been implemented because of lack of a large training set in Farsi as an accurate model. A training set consisting of erroneous and related correction string pairs have been obtained from a large number of instances of the books each of which were typed two times in Computer Research Center of Islamic Sciences. We trained our error model using this huge set. In testing part after finding erroneous words in sample text, our program proposes some candidates for related correction. The paper focuses on describing the method of ranking related corrections. This method is customized version of Noisy Channel Spelling Correction for Farsi. This ranking method attempts to find intended correction c from a typo t, that maximizes P(c) P(t | c). In this paper different methods are described and analyzed to obtain a wide overview of the field. Our evaluation results show that Noisy Channel Model using our corpus and training set in this framework works more accurately and improves efficiently in comparison with other methods.

pdf bib
Improving K-Nearest Neighbor Efficacy for Farsi Text Classification
Mohammad Hossein Elahimanesh | Behrouz Minaei | Hossein Malekinezhad
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

One of the common processes in the field of text mining is text classification. Because of the complex nature of Farsi language, words with separate parts and combined verbs, the most of text classification systems are not applicable to Farsi texts. K-Nearest Neighbors (KNN) is one of the most popular used methods for text classification and presents good performance in experiments on different datasets. A method to improve the classification performance of KNN is proposed in this paper. Effects of removing or maintaining stop words, applying N-Grams with different lengths are also studied. For this study, a portion of a standard Farsi corpus called Hamshahri1 and articles of some archived newspapers are used. As the results indicate, classification efficiency improves by applying this approach especially when eight-grams indexing method and removing stop words are applied. Using N-grams with lengths more than 3 characters, presented very encouraging results for Farsi text classification. The Results of classification using our method are compared with the results obtained by mentioned related works.


pdf bib
A Persian Part-Of-Speech Tagger Based on Morphological Analysis
Mahdi Mohseni | Behrouz Minaei-bidgoli
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes a method based on morphological analysis of words for a Persian Part-Of-Speech (POS) tagging system. This is a main part of a process for expanding a large Persian corpus called Peyekare (or Textual Corpus of Persian Language). Peykare is arranged into two parts: annotated and unannotated parts. We use the annotated part in order to create an automatic morphological analyzer, a main segment of the system. Morphosyntactic features of Persian words cause two problems: the number of tags is increased in the corpus (586 tags) and the form of the words is changed. This high number of tags debilitates any taggers to work efficiently. From other side the change of word forms reduces the frequency of words with the same lemma; and the number of words belonging to a specific tag reduces as well. This problem also has a bad effect on statistical taggers. The morphological analyzer by removing the problems helps the tagger to cover a large number of tags in the corpus. Using a Markov tagger the method is evaluated on the corpus. The experiments show the efficiency of the method in Persian POS tagging.