Ehsaneddin Asgari - ACL Anthology

Ehsaneddin Asgari

2026

HarfoSokhan: A Comprehensive Parallel Dataset for Transitions between Persian Colloquial and Formal Variations
Hamid Jahad Sarvestani | Vida Ramezanian | Saee Saadat | Neda Taghizadeh Serajeh | Maryam Sadat Razavi Taheri | Shohreh Kasaei | MohammadAmin Fazli | Ehsaneddin Asgari
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

A wide array of NLP/NLU models have been developed for the Persian language and have shown promising results. However, the performance of such models drops significantly when applied to the colloquial form of Persian. This challenge arises from the substantial differences between colloquial and formal Persian and the lack of parallel data facilitating the robustness of the model to the colloquial data or to transform the data to formal Persian. In addressing this gap, our research is dedicated to the development of the HarfoSokhan dataset, a large-scale colloquial to formal Persian parallel dataset of 6M sentence pairs. Our proposed dataset is a critical resource for training models that can effectively bridge the linguistic variations between colloquial and formal Persian. To illustrate the utility of our dataset, we used it to train a GPT2 model, which exhibited remarkable proficiency in colloquial to formal text style transfer, outperforming both OpenAI’s GPT-3.5-turbo model and a leading rule-based system in this task. This conclusion is supported by our proposed ranking-based human evaluation. The results underscore the significance of the HarfoSokhan dataset in enhancing the performance of natural language processing models in the challenging task of colloquial to formal Persian conversion.

Detecting Subtle Biases: An Ethical Lens on Underexplored Areas in AI Language Models Biases
Shayan Bali | Farhan Farsi | Mohammad Hosseini | Adel Khorramrouz | Ehsaneddin Asgari
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) are increasingly embedded in the daily lives of individuals across diverse social classes. This widespread integration raises urgent concerns about the subtle, implicit biases these models may contain. In this work, we investigate such biases through the lens of ethical reasoning, analyzing model responses to scenarios in a new dataset we propose comprising 1,016 scenarios, systematically categorized into ethical, unethical, and neutral types. Our study focuses on dimensions that are socially influential but less explored, including (i) residency status, (ii) political ideology, (iii) Fitness Status, (iv) educational attainment, and (v) attitudes toward AI. To assess LLMs’ behavior, we propose a baseline and employ one statistical test and one metric: a permutation test that reveals the presence of bias by comparing the probability distributions of ethical/unethical scenarios with the probability distribution of neutral scenarios on each demographic group, and a tendency measurement that captures the magnitude of bias with respect to the relative difference between probability distribution of ethical and unethical scenarios. Our evaluations of 12 prominent LLMs reveal persistent and nuanced biases across all four attributes, and Llama models exhibited the most pronounced biases. These findings highlight the need for refined ethical benchmarks and bias-mitigation tools in LLMs.

MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment
Omid Ghahroodi | Arshia Hemmat | Marzia Nouri | Seyed Mohammad Hadi Hosseini | Doratossadat Dastgheib | Mohammad Vali Sanian | Alireza Sahebi | Reihaneh Zohrabi | Mohammad Hossein Rohban | Ehsaneddin Asgari | Mahdieh Soleymani Baghshah
Findings of the Association for Computational Linguistics: EACL 2026

Recent advancements in large vision-language models (VLMs) have primarily focused on English, with limited attention given to other languages. To address this gap, we introduce MEENA (also known as PersianMMMU), the first dataset designed to evaluate Persian VLMs across scientific, reasoning, and human-level understanding tasks. Our dataset comprises approximately 7,500 Persian and 3,000 English questions, covering a wide range of topics such as reasoning, mathematics, physics, diagrams, charts, and Persian art and literature. Key features of MEENA include: (1) diverse subject coverage spanning various educational levels, from primary to upper secondary school, (2) rich metadata, including difficulty levels and descriptive answers, (3) original Persian data that preserves cultural nuances, (4) a bilingual structure to assess cross-linguistic performance, and (5) a series of diverse experiments assessing various capabilities, including overall performance, the model’s ability to attend to images, and its tendency to generate hallucinations. We hope this benchmark contributes to enhancing VLM capabilities beyond English.

2025

ImageEval 2025: The First Arabic Image Captioning Shared Task
Ahlam Bashiti | Alaa Aljabari | Hadi Khaled Hamoud | Md. Rafiul Biswas | Bilal Mohammed Shalash | Mustafa Jarrar | Fadi Zaraket | George Mikros | Ehsaneddin Asgari | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

We present ImageEval 2025, the first shared task dedicated to Arabic image captioning. The task addresses the critical gap in multimodal Arabic NLP by focusing on two complementary subtasks: (1) creating the first open-source, manually-captioned Arabic image dataset through a collaborative datathon, and (2) developing and evaluating Arabic image captioning models. A total of 44 teams registered, of which eight submitted during the test phase, producing 111 valid submissions. Evaluation was conducted using automatic metrics, LLM-based judgment, and human assessment. In Subtask 1, the best-performing system achieved a cosine similarity of 65.5, while in Subtask 2, the top score was 60.0. Although these results show encouraging progress, they also confirm that Arabic image captioning remains a challenging task, particularly due to cultural grounding requirements, morphological richness, and dialectal variation. All datasets, baseline models, and evaluation tools are released publicly to support future research in Arabic multimodal NLP.

Taxi1500: A Dataset for Multilingual Text Classification in 1500 Languages
Chunlan Ma | Ayyoob Imani | Haotian Ye | Renhao Pei | Ehsaneddin Asgari | Hinrich Schuetze
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

While broad-coverage multilingual natural language processing tools have been developed, a significant portion of the world’s over 7000 languages are still neglected. One reason is the lack of evaluation datasets that cover a diverse range of languages, particularly those that are low-resource or endangered. To address this gap, we present a large-scale text classification dataset encompassing 1504 languages many of which have otherwise limited or no annotated data. This dataset is constructed using parallel translations of the Bible. We develop relevant topics, annotate the English data through crowdsourcing and project these annotations onto other languages via aligned verses. We benchmark a range of existing multilingual models on this dataset. We make our dataset and code available to the public.

ParsiPy: NLP Toolkit for Historical Persian Texts in Python
Farhan Farsi | Parnian Fazel | Sepand Haghighi | Sadra Sabouri | Farzaneh Goshtasb | Nadia Hajipour | Ehsaneddin Asgari | Hossein Sameti
Proceedings of the Second Workshop on Ancient Language Processing

The study of historical languages presents unique challenges due to their complex ortho-graphic systems, fragmentary textual evidence, and the absence of standardized digital repre-sentations of text in those languages. Tack-ling these challenges needs special NLP digi-tal tools to handle phonetic transcriptions and analyze ancient texts. This work introduces ParsiPy1, an NLP toolkit designed to facili-tate the analysis of historical Persian languages by offering modules for tokenization, lemma-tization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embed-ding. We demonstrate the utility of our toolkit through the processing of Parsig (Middle Per-sian) texts, highlighting its potential for ex-panding computational methods in the study of historical languages. Through this work, we contribute to the field of computational philol-ogy, offering tools that can be adapted for the broader study of ancient texts and their digital preservation.

Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description
Mahshid Dehghani | Amirahmad Shafiee | Ali Shafiei | Neda Fallah | Farahmand Alizadeh | Mohammad Mehdi Gholinejad | Hamid Behroozi | Jafar Habibi | Ehsaneddin Asgari
Findings of the Association for Computational Linguistics: NAACL 2025

3D facial emotion modeling has important applications in areas such as animation design, virtual reality, and emotional human-computer interaction (HCI). However, existing models are constrained by limited emotion classes and insufficient datasets. To address this, we introduce Emo3D, an extensive “Text-Image-Expression dataset” that spans a wide spectrum of human emotions, each paired with images and 3D blendshapes. Leveraging Large Language Models (LLMs), we generate a diverse array of textual descriptions, enabling the capture of a broad range of emotional expressions. Using this unique dataset, we perform a comprehensive evaluation of fine-tuned language-based models and vision-language models, such as Contrastive Language-Image Pretraining (CLIP), for 3D facial expression synthesis. To better assess conveyed emotions, we introduce Emo3D metric, a new evaluation metric that aligns more closely with human perception than traditional Mean Squared Error (MSE). Unlike MSE, which focuses on numerical differences, Emo3D captures emotional nuances in visual-text alignment and semantic richness. Emo3D dataset and metric hold great potential for advancing applications in animation and virtual reality.

Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi | Amirhosein Zobeiri | Mahdi Dehghani | Mohammadali Mohammadkhani | Bardia Mohammadi | Omid Ghahroodi | Mahdieh Soleymani Baghshah | Ehsaneddin Asgari
Findings of the Association for Computational Linguistics: ACL 2025

Large Language Models (LLMs) suffer from hallucinations and outdated knowledge due to their reliance on static training data. Retrieval-Augmented Generation (RAG) mitigates these issues by integrating external dynamic information for improved factual grounding. With advances in multimodal learning, Multimodal RAG extends this approach by incorporating multiple modalities such as text, images, audio, and video to enhance the generated outputs. However, cross-modal alignment and reasoning introduce unique challenges beyond those in unimodal RAG. This survey offers a structured and comprehensive analysis of Multimodal RAG systems, covering datasets, benchmarks, metrics, evaluation, methodologies, and innovations in retrieval, fusion, augmentation, and generation. We review training strategies, robustness enhancements, loss functions, and agent-based approaches, while also exploring the diverse Multimodal RAG scenarios. In addition, we outline open challenges and future directions to guide research in this evolving field. This survey lays the foundation for developing more capable and reliable AI systems that effectively leverage multimodal dynamic external knowledge bases. All resources are publicly available at https://github.com/llm-lab-org/Multimodal-RAG-Survey.

PahGen: Generating Ancient Pahlavi Text via Grammar-guided Zero-shot Translation
Farhan Farsi | Parnian Fazel | Farzaneh Goshtasb | Nadia Hajipour | Sadra Sabouri | Ehsaneddin Asgari | Hossein Sameti
Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)

The Pahlavi language, aka Middle Persian, is a critical part of Persian cultural and historical heritage which bridges the Old Persian and Modern Persian (Farsi). However, due to its limited digital presence and the scarcity of comprehensive linguistic resources, Pahlavi is at risk of extinction. As an early attempt to preserve this language, this study introduces a framework to translate English text into Pahlavi. Our approach combines grammar-guided term extraction with zero-shot translation, leveraging large language models (LLMs) to generate syntactically and semantically accurate Pahlavi sentences.This framework aims to preserve the Pahlavi language and serves as a model for reviving other endangered languages with similar characteristics. Finally using our framework, we generate a novel dataset of 360 expert-validated parallel English-Pahlavi texts.

AIMA at SemEval-2025 Task 1: Bridging Text and Image for Idiomatic Knowledge Extraction via Mixture of Experts
Arash Rasouli | Erfan Sadraiye | Omid Ghahroodi | Hamid Rabiee | Ehsaneddin Asgari
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Idioms are integral components of language, playing a crucial role in understanding and processing linguistic expressions. Although extensive research has been conducted on the comprehension of idioms in the text domain, their interpretation in multi-modal spaces remains largely unexplored. In this work, we propose a multi-expert framework to investigate the transfer of idiomatic knowledge from the language to the vision modality. Through a series of experiments, we demonstrate that leveraging text-based representations of idioms can significantly enhance understanding of the visual space, bridging the gap between linguistic and visual semantics.

2024

Transformers for Bridging Persian Dialects: Transliteration Model for Tajiki and Iranian Scripts
MohammadAli SadraeiJavaheri | Ehsaneddin Asgari | Hamid Reza Rabiee
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this study, we address the linguistic challenges posed by Tajiki Persian, a distinct variant of the Persian language that utilizes the Cyrillic script due to historical “Russification”. This distinguishes it from other Persian dialects that adopt the Arabic script. Despite its profound linguistic and cultural significance, Tajiki Persian remains a low-resource language with scant digitized datasets for computational applications. To address this deficiency, we created a parallel corpus using Shahnameh, a seminal Persian epic poem. Employing optical character recognition, we extracted Tajiki Persian verses from primary sources and applied a heuristic method to align them with their Iranian Persian counterparts. We then trained and assessed transliteration models using two prominent sequence-to-sequence architectures: GRU with attention and transformer. Our results underscore the enhanced performance of our models, particularly in contrast to pre-trained large multilingual models like GPT-3.5, emphasizing the value of dedicated datasets in advancing computational approaches for underrepresented languages. With the publication of this work, we are disseminating, for the first time, a vast collection of Persian poetry spanning 1000 years, transcribed in Tajiki scripts for the benefit of the Tajiki-speaking communities. The dataset, along with the model’s code and checkpoints, is accessible at https://github.com/language-ml/Tajiki-Shahname, marking a significant contribution to computational linguistic resources for Tajiki Persian.

AIMA at SemEval-2024 Task 10: History-Based Emotion Recognition in Hindi-English Code-Mixed Conversations
Mohammad Mahdi Abootorabi | Nona Ghazizadeh | Seyed Arshan Dalili | Alireza Ghahramani Kure | Mahshid Dehghani | Ehsaneddin Asgari
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this study, we introduce a solution to the SemEval 2024 Task 10 on subtask 1, dedicated to Emotion Recognition in Conversation (ERC) in code-mixed Hindi-English conversations. ERC in code-mixed conversations presents unique challenges, as existing models are typically trained on monolingual datasets and may not perform well on code-mixed data. To address this, we propose a series of models that incorporate both the previous and future context of the current utterance, as well as the sequential information of the conversation. To facilitate the processing of code-mixed data, we developed a Hinglish-to-English translation pipeline to translate the code-mixed conversations into English. We designed four different base models, each utilizing powerful pre-trained encoders to extract features from the input but with varying architectures. By ensembling all of these models, we developed a final model that outperforms all other baselines.

HierarchyEverywhere at SemEval-2024 Task 4: Detection of Persuasion Techniques in Memes Using Hierarchical Text Classifier
Omid Ghahroodi | Ehsaneddin Asgari
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Text classification is an important task in natural language processing. Hierarchical Text Classification (HTC) is a subset of text classification task-type. HTC tackles multi-label classification challenges by leveraging tree structures that delineate relationships between classes, thereby striving to enhance classification accuracy through the utilization of inter-class relationships. Memes, as prevalent vehicles of modern communication within social networks, hold immense potential as instruments for propagandistic dissemination due to their profound impact on users. In SemEval-2024 Task 4, the identification of propaganda and its various forms in memes is explored through two sub-tasks: (i) utilizing only the textual component of memes, and (ii) incorporating both textual and pictorial elements. In this study, we address the proposed problem through the lens of HTC, using state-of-the-art hierarchical text classification methodologies to detect propaganda in memes. Our system achieved first place in English Sub-task 2a, underscoring its efficacy in tackling the complexities inherent in propaganda detection within the meme landscape.

AIMA at SemEval-2024 Task 3: Simple Yet Powerful Emotion Cause Pair Analysis
Alireza Ghahramani Kure | Mahshid Dehghani | Mohammad Mahdi Abootorabi | Nona Ghazizadeh | Seyed Arshan Dalili | Ehsaneddin Asgari
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

The SemEval-2024 Task 3 presents two subtasks focusing on emotion-cause pair extraction within conversational contexts. Subtask 1 revolves around the extraction of textual emotion-cause pairs, where causes are defined and annotated as textual spans within the conversation. Conversely, Subtask 2 extends the analysis to encompass multimodal cues, including language, audio, and vision, acknowledging instances where causes may not be exclusively represented in the textual data. Our proposed model for emotion-cause analysis is meticulously structured into three core segments: (i) embedding extraction, (ii) cause-pair extraction & emotion classification, and (iii) cause extraction using QA after finding pairs. Leveraging state-of-the-art techniques and fine-tuning on task-specific datasets, our model effectively unravels the intricate web of conversational dynamics and extracts subtle cues signifying causality in emotional expressions. Our team, AIMA, demonstrated strong performance in the SemEval-2024 Task 3 competition. We ranked as the 10th in subtask 1 and the 6th in subtask 2 out of 23 teams.

TuringQ: Benchmarking AI Comprehension in Theory of Computation
Pardis Sadat Zahraei | Ehsaneddin Asgari
Findings of the Association for Computational Linguistics: EMNLP 2024

We present TuringQ, the first benchmark designed to evaluate the reasoning capabilities of large language models (LLMs) in the theory of computation. TuringQ consists of 4,006 undergraduate and graduate-level question-answer pairs, categorized into four difficulty levels and covering seven core theoretical areas. We evaluate several open-source LLMs, as well as GPT-4, using Chain of Thought prompting and expert human assessment. Additionally, we propose an automated LLM-based evaluation system that demonstrates competitive accuracy when compared to human evaluation. Fine-tuning a Llama3-8B model on TuringQ shows measurable improvements in reasoning ability and out-of-domain tasks such as algebra. TuringQ serves as both a benchmark and a resource for enhancing LLM performance in complex computational reasoning tasks. Our analysis offers insights into LLM capabilities and advances in AI comprehension of theoretical computer science.

The Touché23-ValueEval Dataset for Identifying Human Values behind Arguments
Nailia Mirzakhmedova | Johannes Kiesel | Milad Alshomary | Maximilian Heinrich | Nicolas Handke | Xiaoni Cai | Valentin Barriere | Doratossadat Dastgheib | Omid Ghahroodi | Mohammad Ali Sadraei Javaheri | Ehsaneddin Asgari | Lea Kawaletz | Henning Wachsmuth | Benno Stein
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

While human values play a crucial role in making arguments persuasive, we currently lack the necessary extensive datasets to develop methods for analyzing the values underlying these arguments on a large scale. To address this gap, we present the Touché23-ValueEval dataset, an expansion of the Webis-ArgValues-22 dataset. We collected and annotated an additional 4780 new arguments, doubling the dataset’s size to 9324 arguments. These arguments were sourced from six diverse sources, covering religious texts, community discussions, free-text arguments, newspaper editorials, and political debates. Each argument is annotated by three crowdworkers for 54 human values, following the methodology established in the original dataset. The Touché23-ValueEval dataset was utilized in the SemEval 2023 Task 4. ValueEval: Identification of Human Values behind Arguments, where an ensemble of transformer models demonstrated state-of-the-art performance. Furthermore, our experiments show that a fine-tuned large language model, Llama-2-7B, achieves comparable results.

2023

SinaAI at SemEval-2023 Task 3: A Multilingual Transformer Language Model-based Approach for the Detection of News Genre, Framing and Persuasion Techniques
Aryan Sadeghi | Reza Alipour | Kamyar Taeb | Parimehr Morassafar | Nima Salemahim | Ehsaneddin Asgari
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes SinaAI’s participation in SemEval-2023 Task 3, which involves detecting propaganda in news articles across multiple languages. The task comprises three sub-tasks: (i) genre detection, (ii) news framing,and (iii) persuasion technique identification. The employed dataset includes news articles in nine languages and domains, including English, French, Italian, German, Polish, Russian, Georgian, Greek, and Spanish, with labeled instances of news framing, genre, and persuasion techniques. Our approach combines fine-tuning multilingual language models such as XLM, LaBSE, and mBERT with data augmentation techniques. Our experimental results show that XLM outperforms other models in terms of F1-Micro in and F1-Macro, and the ensemble of XLM and LaBSE achieved the best performance. Our study highlights the effectiveness of multilingual sentence embedding models in multilingual propaganda detection. Our models achieved highest score for two languages (greek and italy) in sub-task 1 and one language (Russian) for sub-task 2.

Borderless Azerbaijani Processing: Linguistic Resources and a Transformer-based Approach for Azerbaijani Transliteration
Reihaneh Zohrabi | Mostafa Masumi | Omid Ghahroodi | Parham AbedAzad | Hamid Beigy | Mohammad Hossein Rohban | Ehsaneddin Asgari
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

Sina at SemEval-2023 Task 4: A Class-Token Attention-based Model for Human Value Detection
Omid Ghahroodi | Mohammad Ali Sadraei | Doratossadat Dastgheib | Mahdieh Soleymani Baghshah | Mohammad Hossein Rohban | Hamid Rabiee | Ehsaneddin Asgari
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The human values expressed in argumentative texts can provide valuable insights into the culture of a society. They can be helpful in various applications such as value-based profiling and ethical analysis. However, one of the first steps in achieving this goal is to detect the category of human value from an argument accurately. This task is challenging due to the lack of data and the need for philosophical inference. It also can be challenging for humans to classify arguments according to their underlying human values. This paper elaborates on our model for the SemEval 2023 Task 4 on human value detection. We propose a class-token attention-based model and evaluate it against baseline models, including finetuned BERT language model and a keyword-based approach.

Ebhaam at SemEval-2023 Task 1: A CLIP-Based Approach for Comparing Cross-modality and Unimodality in Visual Word Sense Disambiguation
Zeinab Taghavi | Parsa Haghighi Naeini | Mohammad Ali Sadraei | Soroush Gooran | Ehsaneddin Asgari | Hamid Reza Rabiee | Hossein Sameti
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on “Visual Word Sense Disambiguation (Visual-WSD)” using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including “Simple prompt-based” methods and “Generated prompt-based” methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75\% in our best attempt.

SUT at SemEval-2023 Task 1: Prompt Generation for Visual Word Sense Disambiguation
Omid Ghahroodi | Seyed Arshan Dalili | Sahel Mesforoush | Ehsaneddin Asgari
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Visual Word Sense Disambiguation (V-WSD) identifies the correct visual sense of a multi-sense word in a specific context. This can be challenging as images may need to provide additional context and words may have multiple senses. A proper V-WSD system can benefit applications like image retrieval and captioning. This paper proposes a Prompt Generation approach to solve this challenge. This approach improves the robustness of language-image models like CLIP to contextual ambiguities and helps them better correlate between textual and visual contexts of different senses of words.

The Language Model, Resources, and Computational Pipelines for the Under-Resourced Iranian Azerbaijani
Marzia Nouri | Mahsa Amani | Reihaneh Zohrabi | Ehsaneddin Asgari
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

2022

Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval
Sayed Hesam Alavian | Ali Satvaty | Sadra Sabouri | Ehsaneddin Asgari | Hossein Sameti
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

Information-seeking dialogue systems, including knowledge identification and response generation, aim to respond to users with fluent, coherent, and informative answers based on users’ needs. This paper discusses our proposed approach, Docalog, for the DialDoc-22 (MultiDoc2Dial) shared task. Docalog identifies the most relevant knowledge in the associated document, in a multi-document setting. Docalog, is a three-stage pipeline consisting of (1) a document retriever model (DR. TEIT), (2) an answer span prediction model, and (3) an ultimate span picker deciding on the most likely answer span, out of all predicted spans. In the test phase of MultiDoc2Dial 2022, Docalog achieved f1-scores of 36.07% and 28.44% and SacreBLEU scores of 23.70% and 20.52%, respectively on the MDD-SEEN and MDD-UNSEEN folds.

Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging
Sajad Mirzababaei | Amir Hossein Kargaran | Hinrich Schütze | Ehsaneddin Asgari
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Many NLP main tasks benefit from an accurate understanding of temporal expressions, e.g., text summarization, question answering, and information retrieval. This paper introduces Hengam, an adversarially trained transformer for Persian temporal tagging outperforming state-of-the-art approaches on a diverse and manually created dataset. We create Hengam in the following concrete steps: (1) we develop HengamTagger, an extensible rule-based tool that can extract temporal expressions from a set of diverse language-specific patterns for any language of interest. (2) We apply HengamTagger to annotate temporal tags in a large and diverse Persian text collection (covering both formal and informal contexts) to be used as weakly labeled data. (3) We introduce an adversarially trained transformer model on HengamCorpus that can generalize over the HengamTagger’s rules. We create HengamGold, the first high-quality gold standard for Persian temporal tagging. Our trained adversarial HengamTransformer not only achieves the best performance in terms of the F1-score (a type F1-Score of 95.42 and a partial F1-Score of 91.60) but also successfully deals with language ambiguities and incorrect spellings. Our code, data, and models are publicly available at https://github.com/kargaranamir/Hengam.

Keyword-based Natural Language Premise Selection for an Automatic Mathematical Statement Proving
Doratossadat Dastgheib | Ehsaneddin Asgari
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing

Extraction of supportive premises for a mathematical problem can contribute to profound success in improving automatic reasoning systems. One bottleneck in automated theorem proving is the lack of a proper semantic information retrieval system for mathematical texts. In this paper, we show the effect of keyword extraction in the natural language premise selection (NLPS) shared task proposed in TextGraph-16 that seeks to select the most relevant sentences supporting a given mathematical statement.

2021

KnowMAN: Weakly Supervised Multinomial Adversarial Networks
Luisa März | Ehsaneddin Asgari | Fabienne Braune | Franziska Zimmermann | Benjamin Roth
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The absence of labeled data for training neural models is often addressed by leveraging knowledge about the specific task, resulting in heuristic but noisy labels. The knowledge is captured in labeling functions, which detect certain regularities or patterns in the training samples and annotate corresponding labels for training. This process of weakly supervised training may result in an over-reliance on the signals captured by the labeling functions and hinder models to exploit other signals or to generalize well. We propose KnowMAN, an adversarial scheme that enables to control influence of signals associated with specific labeling functions. KnowMAN forces the network to learn representations that are invariant to those signals and to pick up other signals that are more generally associated with an output label. KnowMAN strongly improves results compared to direct weakly supervised learning with a pre-trained transformer language model and a feature-based baseline.

2020

UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages
Ehsaneddin Asgari | Fabienne Braune | Benjamin Roth | Christoph Ringlstetter | Mohammad Mofrad
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we introduce UniSent universal sentiment lexica for 1000+ languages. Sentiment lexica are vital for sentiment analysis in absence of document-level annotations, a very common scenario for low-resource languages. To the best of our knowledge, UniSent is the largest sentiment resource to date in terms of the number of covered languages, including many low resource ones. In this work, we use a massively parallel Bible corpus to project sentiment information from English to other languages for sentiment analysis on Twitter data. We introduce a method called DomDrift to mitigate the huge domain mismatch between Bible and Twitter by a confidence weighting scheme that uses domain-specific embeddings to compare the nearest neighbors for a candidate sentiment word in the source (Bible) and target (Twitter) domain. We evaluate the quality of UniSent in a subset of languages for which manually created ground truth was available, Macedonian, Czech, German, Spanish, and French. We show that the quality of UniSent is comparable to manually created sentiment resources when it is used as the sentiment seed for the task of word sentiment prediction on top of embedding representations. In addition, we show that emoticon sentiments could be reliably predicted in the Twitter domain using only UniSent and monolingual embeddings in German, Spanish, French, and Italian. With the publication of this paper, we release the UniSent sentiment lexica at http://language-lab.info/unisent.

EmbLexChange at SemEval-2020 Task 1: Unsupervised Embedding-based Detection of Lexical Semantic Changes
Ehsaneddin Asgari | Christoph Ringlstetter | Hinrich Schütze
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes EmbLexChange, a system introduced by the “Life-Language” team for SemEval-2020 Task 1, on unsupervised detection of lexical-semantic changes. EmbLexChange is defined as the divergence between the embedding based profiles of word w (calculated with respect to a set of reference words) in the source and the target domains (source and target domains can be simply two time frames t_1 and t_2). The underlying assumption is that the lexical-semantic change of word w would affect its co-occurring words and subsequently alters the neighborhoods in the embedding spaces. We show that using a resampling framework for the selection of reference words (with conserved senses), we can more reliably detect lexical-semantic changes in English, German, Swedish, and Latin. EmbLexChange achieved second place in the binary detection of semantic changes in the SemEval-2020.

2017

Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages
Ehsaneddin Asgari | Hinrich Schütze
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting – to the best of our knowledge – the largest crosslingual computational study performed to date. We extend existing methodology for leveraging parallel corpora for typological analysis by overcoming a limiting assumption of earlier work: We only require that a linguistic feature is overtly marked in a few of thousands of languages as opposed to requiring that it be marked in all languages under investigation.

2016

Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance
Ehsaneddin Asgari | Mohammad R.K. Mofrad
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP

Text Analysis and Automatic Triage of Posts in a Mental Health Forum
Ehsaneddin Asgari | Soroush Nasiriany | Mohammad R.K. Mofrad
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

2013

Linguistic Resources and Topic Models for the Analysis of Persian Poems
Ehsaneddin Asgari | Jean-Cédric Chappelier
Proceedings of the Workshop on Computational Linguistics for Literature

Co-authors

Mohammad Mahdi Abootorabi 3

Mahdieh Soleymani Baghshah 3

Seyed Arshan Dalili 3

Mahshid Dehghani 3

Mohammad Hossein Rohban 3

Sadra Sabouri 3

Reihaneh Zohrabi 3

Fabienne Braune 2

Parnian Fazel 2

Alireza Ghahramani Kure 2

Nona Ghazizadeh 2

Farzaneh Goshtasb 2

Nadia Hajipour 2

Mohammad R.K. Mofrad 2

Hamid Reza Rabiee 2

Christoph Ringlstetter 2

Benjamin Roth 2

Parham AbedAzad 1

Sayed Hesam Alavian 1

Farahmand Alizadeh 1

Alaa Aljabari 1

Milad Alshomary 1

Valentin Barriere 1

Ahlam Bashiti 1

Hamid Behroozi 1

Md. Rafiul Biswas 1

Jean-Cédric Chappelier 1

Mahdi Dehghani 1

MohammadAmin Fazli 1

Mohammad Mehdi Gholinejad 1

Soroush Gooran 1

Sepand Haghighi 1

Hadi Khaled Hamoud 1

Nicolas Handke 1

Maximilian Heinrich 1

Arshia Hemmat 1

Mohammad Hosseini 1

Seyed Mohammad Hadi Hosseini 1

Mustafa Jarrar 1

Amir Hossein Kargaran 1

Shohreh Kasaei 1

Adel Khorramrouz 1

Johannes Kiesel 1

Mostafa Masumi 1

Sahel Mesforoush 1

George Mikros 1

Sajad Mirzababaei 1

Nailia Mirzakhmedova 1

Mohammad Mofrad 1

Bardia Mohammadi 1

Mohammadali Mohammadkhani 1

Parimehr Morassafar 1

Parsa Haghighi Naeini 1

Soroush Nasiriany 1

Vida Ramezanian 1

Arash Rasouli 1

Aryan Sadeghi 1

Erfan Sadraiye 1

Alireza Sahebi 1

Nima Salemahim 1

Mohammad Vali Sanian 1

Hamid Jahad Sarvestani 1

Neda Taghizadeh Serajeh 1

Amirahmad Shafiee 1

Bilal Mohammed Shalash 1

Zeinab Taghavi 1

Maryam Sadat Razavi Taheri 1

Henning Wachsmuth 1

Wajdi Zaghouani 1

Pardis Sadat Zahraei 1

Fadi A. Zaraket 1

Franziska Zimmermann 1

Amirhosein Zobeiri 1

Venues