Wenji Mao - ACL Anthology

Wenji Mao

2025

One Unified Model for Diverse Tasks: Emotion Cause Analysis via Self-Promote Cognitive Structure Modeling
Zhaoxin Yu | Xinglin Xiao | Wenji Mao
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Emotion cause analysis is a critical topic in natural language processing. Key tasks include emotion cause extraction (ECE), emotion-cause pair extraction (ECPE), social emotion cause identification (SECI) as well as social emotion mining and its cause identification (SEMCI). While current emotion cause analysis methods often focus on task-specific model design, they tend to overlook the underlying common ground across these tasks rooted in cognitive emotion theories, in particular, the cognitive structure of emotions. Drawing inspiration from this theory, in this paper, we propose a unified model capable of tackling diverse emotion cause analysis tasks, which constructs the emotion cognitive structure through LLM-based in-context learning. To mitigate the hallucination inherent in LLMs, we introduce a self-promote mechanism built on iterative refinement. It dynamically assesses the reliability of substructures based on their cognitive consistency and leverages the more reliable substructures to promote the inconsistent ones. Experimental results on multiple emotion cause analysis tasks ECE, ECPE, SECI and SEMCI demonstrate the superiority of our unified model over existing SOTA methods and LLM-based baselines.

Perspective-driven Preference Optimization with Entropy Maximization for Diverse Argument Generation
Yilin Cao | Ruike Zhang | Penghui Wei | Qingchao Kong | Wenji Mao
Findings of the Association for Computational Linguistics: EMNLP 2025

In subjective natural language generation tasks, generating diverse perspectives is essential for fostering balanced discourse and mitigating bias. Argument generation with diverse perspectives plays a vital role in advancing the understanding of controversial claims. Despite the strong generative capabilities of large language models (LLMs), the diversity of perspectives remains insufficiently explored within argument generation task. Moreover, there remains a significant research gap in developing methods that explicitly generate multi-perspective arguments under the quality control of claim-stance alignment constraints. In this paper, we propose POEM, a Perspective-aware Preference Optimization with Entropy Maximization framework for diverse argument generation. It enhances perspective diversity through preference optimization based on the constructed preference dataset via perspective mining and diversity measuring. It further introduces entropy maximization to promote perspective diversity by encouraging dispersed semantic representations among the generated arguments. Experimental results on claim-stance argument generation benchmarks show that POEM is capable of generating diverse arguments while maintaining comparable performances in claim and stance controllability as well as text quality compared to the state-of-the-art baselines and human evaluation.

DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling
Minzheng Wang | Xinghua Zhang | Kun Chen | Nan Xu | Haiyang Yu | Fei Huang | Wenji Mao | Yongbin Li
Findings of the Association for Computational Linguistics: ACL 2025

Large language models (LLMs) enabled dialogue systems have become one of the central modes in human-machine interaction, which bring about vast amounts of conversation logs and increasing demand for dialogue generation. The dialogue’s life-cycle spans from Prelude through Interlocution to Epilogue, encompassing rich dialogue elements. Despite large volumes of dialogue-related studies, there is a lack of systematic investigation into the dialogue stages to frame benchmark construction that covers comprehensive dialogue elements. This hinders the precise modeling, generation and assessment of LLMs-based dialogue systems. To bridge this gap, in this paper, we introduce a new research task—Dialogue Element MOdeling, including Element Awareness and Dialogue Agent Interaction, and propose a novel benchmark, DEMO, designed for a comprehensive dialogue modeling and assessment. On this basis, we further build the DEMO agent with the adept ability to model dialogue elements via imitation learning. Extensive experiments on DEMO indicate that current representative LLMs still have considerable potential for enhancement, and our DEMO agent performs well in both dialogue element modeling and out-of-domain tasks.

ImaRA: An Imaginative Frame Augmented Method for Low-Resource Multimodal Metaphor Detection and Explanation
Yuan Tian | Minzheng Wang | Nan Xu | Wenji Mao
Findings of the Association for Computational Linguistics: NAACL 2025

Multimodal metaphor detection is an important and challenging task in multimedia computing, which aims to distinguish between metaphorical and literal multimodal expressions. Existing studies mainly utilize typical multimodal computing approaches for detection, neglecting the unique cross-domain and cross-modality characteristics underlying multimodal metaphor understanding. According to Conceptual Metaphor Theory (CMT), the inconsistency between source and target domains and their attribute similarity are essential to infer the intricate meanings implied in metaphors. In practice, the scarcity of the annotated multimodal metaphorical contents in the real world brings additional difficulty to the detection task and further complicates the understanding of multimodal metaphors. To address the above challenges, in this paper, we propose a novel Imaginative FRame Augmented (ImaRA) method for low-resource multimodal metaphor detection and explanation inspired by CMT. Specifically, we first identify imaginative frame as an associative structure to stimulate the imaginative thinking of multimodal metaphor detection and understanding. We then construct a cross-modal imagination dataset rich in multimodal metaphors and corresponding imaginative frames, and retrieve an augmented instance from this imagination dataset using imaginative frames mined from the input. This augmented instance serves as the demonstration exemplar to boost the metaphor reasoning ability of the multimodal large language model (MLLM) in low-resource multimodal scenarios. Experiments on two publicly available datasets show that our method consistently achieves robust results compared to MLLM-based methods for both multimodal metaphor detection and explanation in low-resource scenarios and meanwhile surpasses existing multimodal metaphor detection methods with full training data.

2024

An LLM-Enabled Knowledge Elicitation and Retrieval Framework for Zero-Shot Cross-Lingual Stance Identification
Ruike Zhang | Yuan Tian | Penghui Wei | Daniel Dajun Zeng | Wenji Mao
Findings of the Association for Computational Linguistics: EMNLP 2024

Stance detection aims to identify the attitudes toward specific targets from text, which is an important research area in text mining and social media analytics. Existing research is mainly conducted in monolingual setting on English datasets. To tackle the data scarcity problem in low-resource languages, cross-lingual stance detection (CLSD) transfers the knowledge from high-resource (source) language to low-resource (target) language. The CLSD task is the most challenging in zero-shot setting when no training data is available in target language, and transferring stance-relevant knowledge learned from high-resource language to bridge the language gap is the key for improving the performance of zero-shot CLSD. In this paper, we leverage the capability of large language model (LLM) for stance knowledge acquisition, and propose KEAR, a knowledge elicitation and retrieval framework. The knowledge elicitation module in KEAR first derives different types of stance knowledge from LLM’s reasoning process. Then, the knowledge retrieval module in KEAR matches the target language input to the most relevant stance knowledge for enhancing text representations. Experiments on multilingual datasets show the effectiveness of KEAR compared with competitive baselines as well as the CLSD approaches trained with labeled data in target language.

Bridging Word-Pair and Token-Level Metaphor Detection with Explainable Domain Mining
Yuan Tian | Ruike Zhang | Nan Xu | Wenji Mao
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Metaphor detection aims to identify whether a linguistic expression in text is metaphorical or literal. Most existing research tackles this problem either using word-pair or token-level information as input, and thus treats word-pair and token-level metaphor detection as distinct subtasks. Benefited from the simplified structure of word pairs, recent methods for word-pair metaphor detection can provide intermediate explainable clues for the detection results, which remains a challenging issue for token-level metaphor detection. To mitigate this issue in token-level metaphor detection and take advantage of word pairs, in this paper, we make the first attempt to bridge word-pair and token-level metaphor detection via modeling word pairs within a sentence as explainable intermediate information. As the central role of verb in metaphorical expressions, we focus on token-level verb metaphor detection and propose a novel explainable Word Pair based Domain Mining (WPDM) method. Our work is inspired by conceptual metaphor theory (CMT). We first devise an approach for conceptual domain mining utilizing semantic role mapping and resources at cognitive, commonsense and lexical levels. We then leverage the inconsistency between source and target domains for core word pair modeling to facilitate the explainability. Experiments on four datasets verify the effectiveness of our method and demonstrate its capability to provide the core word pair and corresponding conceptual domains as explainable clues for metaphor detection.

A Theory Guided Scaffolding Instruction Framework for LLM-Enabled Metaphor Reasoning
Yuan Tian | Nan Xu | Wenji Mao
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Metaphor detection is a challenging task in figurative language processing, which aims to distinguish between metaphorical and literal expressions in text. Existing methods tackle metaphor detection via training or fine-tuning discriminative models on labeled data. However, these approaches struggle to explain the underlying reasoning process behind the metaphorical/literal judgment. Recently, large language models (LLMs) have shown promise in language reasoning tasks. Although promising, LLM-based methods for metaphor detection and reasoning are still faced with the challenging issue of bringing the explainable concepts for metaphor reasoning and their linguistic manifestation. To fill this gap, we propose a novel Theory guided Scaffolding Instruction (TSI) framework that instructs an LLM to infer the underlying reasoning process of metaphor detection guided by metaphor theories for the first time. Our work is inspired by a pedagogical strategy called scaffolding instruction, which encourages educators to provide questioning and support as scaffolding so as to assist learners in constructing the understanding of pedagogical goals step by step. We first construct a metaphor knowledge graph grounded in metaphor theory which serves as the instructional structure to obtain a series of scaffolding questions, directing the LLM to incrementally generate the reasoning process for metaphor understanding through dialogue interactions. During this theory guided instruction process, we explore the LLM’s mastery boundary and provide the relevant knowledge as scaffolding support when the question is beyond the LLM’s capability. Experimental results verify that our method significantly outperforms both the LLM-based reasoning methods and the SOTA methods in metaphor detection, indicating the facilitation of metaphor and instruction theories in guiding LLM-based reasoning process.

TARA: Token-level Attribute Relation Adaptation for Multi-Attribute Controllable Text Generation
Yilin Cao | Jiahao Zhao | Ruike Zhang | Hanyi Zou | Wenji Mao
Findings of the Association for Computational Linguistics: EMNLP 2024

PromISe: Releasing the Capabilities of LLMs with Prompt Introspective Search
Minzheng Wang | Nan Xu | Jiahao Zhao | Yin Luo | Wenji Mao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The development of large language models (LLMs) raises the importance of assessing the fairness and completeness of various evaluation benchmarks. Regrettably, these benchmarks predominantly utilize uniform manual prompts, which may not fully capture the expansive capabilities of LLMs—potentially leading to an underestimation of their performance. To unlock the potential of LLMs, researchers pay attention to automated prompt search methods, which employ LLMs as optimizers to discover optimal prompts. However, previous methods generate the solutions implicitly, which overlook the underlying thought process and lack explicit feedback. In this paper, we propose a novel prompt introspective search framework, namely PromISe, to better release the capabilities of LLMs. It converts the process of optimizing prompts into an explicit chain of thought, through a step-by-step procedure that integrates self-introspect and self-refine. Extensive experiments, conducted over 73 tasks on two major benchmarks, demonstrate that our proposed PromISe significantly boosts the performance of 12 well-known LLMs compared to the baseline approach. Moreover, our study offers enhanced insights into the interaction between humans and LLMs, potentially serving as a foundation for future designs and implementations. Keywords: large language models, prompt search, self-introspect, self-refine

2023

Dynamic Routing Transformer Network for Multimodal Sarcasm Detection
Yuan Tian | Nan Xu | Ruike Zhang | Wenji Mao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multimodal sarcasm detection is an important research topic in natural language processing and multimedia computing, and benefits a wide range of applications in multiple domains. Most existing studies regard the incongruity between image and text as the indicative clue in identifying multimodal sarcasm. To capture cross-modal incongruity, previous methods rely on fixed architectures in network design, which restricts the model from dynamically adjusting to diverse image-text pairs. Inspired by routing-based dynamic network, we model the dynamic mechanism in multimodal sarcasm detection and propose the Dynamic Routing Transformer Network (DynRT-Net). Our method utilizes dynamic paths to activate different routing transformer modules with hierarchical co-attention adapting to cross-modal incongruity. Experimental results on a public dataset demonstrate the effectiveness of our method compared to the state-of-the-art methods. Our codes are available at https://github.com/TIAN-viola/DynRT.

Generative Adversarial Training with Perturbed Token Detection for Model Robustness
Jiahao Zhao | Wenji Mao
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Adversarial training is the dominant strategy towards model robustness. Current adversarial training methods typically apply perturbations to embedding representations, whereas actual text-based attacks introduce perturbations as discrete tokens. Thus there exists a gap between the continuous embedding representations and discrete text tokens that hampers the effectiveness of adversarial training. Moreover, the continuous representations of perturbations cannot be further utilized, resulting in the suboptimal performance. To bridge this gap for adversarial robustness, in this paper, we devise a novel generative adversarial training framework that integrates gradient-based learning, adversarial example generation and perturbed token detection. Our proposed framework consists of generative adversarial attack and adversarial training process. Specifically, in generative adversarial attack, the embeddings are shared between the classifier and the generative model, which enables the generative model to leverage the gradients from the classifier for generating perturbed tokens. Then, adversarial training process combines adversarial regularization with perturbed token detection to provide token-level supervision and improve the efficiency of sample utilization. Extensive experiments on five datasets from the AdvGLUE benchmark demonstrate that our framework significantly enhances the model robustness, surpassing the state-of-the-art results of ChatGPT by 10% in average accuracy.

Target-Oriented Relation Alignment for Cross-Lingual Stance Detection
Ruike Zhang | Nan Xu | Hanxuan Yang | Yuan Tian | Wenji Mao
Findings of the Association for Computational Linguistics: ACL 2023

Stance detection is an important task in text mining and social media analytics, aiming to automatically identify the user’s attitude toward a specific target from text, and has wide applications in a variety of domains. Previous work on stance detection has mainly focused on monolingual setting. To address the problem of imbalanced language resources, cross-lingual stance detection is proposed to transfer the knowledge learned from a high-resource (source) language (typically English) to another low-resource (target) language. However, existing research on cross-lingual stance detection has ignored the inconsistency in the occurrences and distributions of targets between languages, which consequently degrades the performance of stance detection in low-resource languages. In this paper, we first identify the target inconsistency issue in cross-lingual stance detection, and propose a fine-grained Target-oriented Relation Alignment (TaRA) method for the task, which considers both target-level associations and language-level alignments. Specifically, we propose the Target Relation Graph to learn the in-language and cross-language target associations. We further devise the relation alignment strategy to enable knowledge transfer between semantically correlated targets across languages. Experimental results on the representative datasets demonstrate the effectiveness of our method compared to competitive methods under variant settings.

Cross-Lingual Cross-Target Stance Detection with Dual Knowledge Distillation Framework
Ruike Zhang | Hanxuan Yang | Wenji Mao
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Stance detection aims to identify the user’s attitude toward specific targets from text, which is an important research area in text mining and benefits a variety of application domains. Existing studies on stance detection were conducted mainly in English. Due to the low-resource problem in most non-English languages, cross-lingual stance detection was proposed to transfer knowledge from high-resource (source) language to low-resource (target) language. However, previous research has ignored the practical issue of no labeled training data available in target language. Moreover, target inconsistency in cross-lingual stance detection brings about the additional issue of unseen targets in target language, which in essence requires the transfer of both language and target-oriented knowledge from source to target language. To tackle these challenging issues, in this paper, we propose the new task of cross-lingual cross-target stance detection and develop the first computational work with dual knowledge distillation. Our proposed framework designs a cross-lingual teacher and a cross-target teacher using the source language data and a dual distillation process that transfers the two types of knowledge to target language. To bridge the target discrepancy between languages, cross-target teacher mines target category information and generalizes it to the unseen targets in target language via category-oriented learning. Experimental results on multilingual stance datasets demonstrate the effectiveness of our method compared to the competitive baselines.

Modeling Conceptual Attribute Likeness and Domain Inconsistency for Metaphor Detection
Yuan Tian | Nan Xu | Wenji Mao | Daniel Zeng
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Metaphor detection is an important and challenging task in natural language processing, which aims to distinguish between metaphorical and literal expressions in text. Previous studies mainly leverage the incongruity of source and target domains and contextual clues for detection, neglecting similar attributes shared between source and target concepts in metaphorical expressions. Based on conceptual metaphor theory, these similar attributes are essential to infer implicit meanings conveyed by the metaphor. Under the guidance of conceptual metaphor theory, in this paper, we model the likeness of attribute for the first time and propose a novel Attribute Likeness and Domain Inconsistency Learning framework (AIDIL) for word-pair metaphor detection. Specifically, we propose an attribute siamese network to mine similar attributes between source and target concepts. We then devise a domain contrastive learning strategy to learn the semantic inconsistency of concepts in source and target domains. Extensive experiments on four datasets verify that our method significantly outperforms the previous state-of-the-art methods, and demonstrate the generalization ability of our method.

2020

Reasoning with Multimodal Sarcastic Tweets via Modeling Cross-Modality Contrast and Semantic Association
Nan Xu | Zhixiong Zeng | Wenji Mao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Sarcasm is a sophisticated linguistic phenomenon to express the opposite of what one really means. With the rapid growth of social media, multimodal sarcastic tweets are widely posted on various social platforms. In multimodal context, sarcasm is no longer a pure linguistic phenomenon, and due to the nature of social media short text, the opposite is more often manifested via cross-modality expressions. Thus traditional text-based methods are insufficient to detect multimodal sarcasm. To reason with multimodal sarcastic tweets, in this paper, we propose a novel method for modeling cross-modality contrast in the associated context. Our method models both cross-modality contrast and semantic association by constructing the Decomposition and Relation Network (namely D&R Net). The decomposition network represents the commonality and discrepancy between image and text, and the relation network models the semantic association in cross-modality context. Experimental results on a public dataset demonstrate the effectiveness of our model in multimodal sarcasm detection.

Effective Inter-Clause Modeling for End-to-End Emotion-Cause Pair Extraction
Penghui Wei | Jiahao Zhao | Wenji Mao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Emotion-cause pair extraction aims to extract all emotion clauses coupled with their cause clauses from a given document. Previous work employs two-step approaches, in which the first step extracts emotion clauses and cause clauses separately, and the second step trains a classifier to filter out negative pairs. However, such pipeline-style system for emotion-cause pair extraction is suboptimal because it suffers from error propagation and the two steps may not adapt to each other well. In this paper, we tackle emotion-cause pair extraction from a ranking perspective, i.e., ranking clause pair candidates in a document, and propose a one-step neural approach which emphasizes inter-clause modeling to perform end-to-end extraction. It models the interrelations between the clauses in a document to learn clause representations with graph attention, and enhances clause pair representations with kernel-based relative position embedding for effective ranking. Experimental results show that our approach significantly outperforms the current two-step systems, especially in the condition of extracting multiple pairs in one document.

2019

Modeling Conversation Structure and Temporal Dynamics for Jointly Predicting Rumor Stance and Veracity
Penghui Wei | Nan Xu | Wenji Mao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Automatically verifying rumorous information has become an important and challenging task in natural language processing and social media analytics. Previous studies reveal that people’s stances towards rumorous messages can provide indicative clues for identifying the veracity of rumors, and thus determining the stances of public reactions is a crucial preceding step for rumor veracity prediction. In this paper, we propose a hierarchical multi-task learning framework for jointly predicting rumor stance and veracity on Twitter, which consists of two components. The bottom component of our framework classifies the stances of tweets in a conversation discussing a rumor via modeling the structural property based on a novel graph convolutional network. The top component predicts the rumor veracity by exploiting the temporal dynamics of stance evolution. Experimental results on two benchmark datasets show that our method outperforms previous methods in both rumor stance classification and veracity prediction.

Co-authors

Minzheng Wang 3

Qingchao Kong 1

Daniel Dajun Zeng 1

Zhixiong Zeng 1

Xinghua Zhang 1

Venues