Rui Xia - ACL Anthology

Rui Xia

2025

VCD: A Dataset for Visual Commonsense Discovery in Images
Xiangqing Shen | Fanfan Wang | Siwei Wu | Rui Xia
Findings of the Association for Computational Linguistics: ACL 2025

Visual commonsense plays a vital role in understanding and reasoning about the visual world. While commonsense knowledge bases like ConceptNet provide structured collections of general facts, they lack visually grounded representations. Scene graph datasets like Visual Genome, though rich in object-level descriptions, primarily focus on directly observable information and lack systematic categorization of commonsense knowledge. We present Visual Commonsense Dataset (VCD), a large-scale dataset containing over 100,000 images and 14 million object-commonsense pairs that bridges this gap. VCD introduces a novel three-level taxonomy for visual commonsense, integrating both Seen (directly observable) and Unseen (inferrable) commonsense across Property, Action, and Space aspects. Each commonsense is represented as a triple where the head entity is grounded to object bounding boxes in images, enabling scene-dependent and object-specific visual commonsense representation. To demonstrate VCD’s utility, we develop VCM, a generative model that combines a vision-language model with instruction tuning to discover diverse visual commonsense from images. Extensive evaluations demonstrate both the high quality of VCD and its value as a resource for advancing visually grounded commonsense understanding and reasoning. Our dataset and code will be released on https://github.com/NUSTM/VCD.

ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains
Zilu Dong | Xiangqing Shen | Zinong Yang | Rui Xia
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Current knowledge editing methods for large language models (LLMs) struggle to maintain logical consistency when propagating ripple effects to associated facts. We propose ChainEdit, a framework that synergizes knowledge graph-derived logical rules with LLM logical reasoning capabilities to enable systematic chain updates. By automatically extracting logical patterns from structured knowledge bases and aligning them with LLMs’ internal logics, ChainEdit dynamically generates and edits logically connected knowledge clusters. Experiments demonstrate an improvement of more than 30% in logical generalization over baselines while preserving editing reliability and specificity. We further address evaluation biases in existing benchmarks through knowledge-aware protocols that disentangle external dependencies. This work establishes new state-of-the-art performance on ripple effect while ensuring internal logical consistency after knowledge editing.

From Phrases to Subgraphs: Fine-Grained Semantic Parsing for Knowledge Graph Question Answering
Yurun Song | Xiangqing Shen | Rui Xia
Findings of the Association for Computational Linguistics: ACL 2025

The recent emergence of large language models (LLMs) has brought new opportunities to knowledge graph question answering (KGQA), but also introduces challenges such as semantic misalignment and reasoning noise. Semantic parsing (SP), previously a mainstream approach for KGQA, enables precise graph pattern matching by mapping natural language queries to executable logical forms. However, it faces limitations in scalability and generalization, especially when dealing with complex, multi-hop reasoning tasks.In this work, we propose a Fine-Grained Semantic Parsing (FGSP) framework for KGQA. Our framework constructs a fine-grained mapping library via phrase-level segmentation of historical question-logical form pairs, and performs online retrieval and fusion of relevant subgraph fragments to answer complex queries. This fine-grained, compositional approach ensures tighter semantic alignment between questions and knowledge graph structures, enhancing both interpretability and adaptability to diverse query types. Experimental results on two KGQA benchmarks demonstrate the effectiveness of FGSP, with a notable 18.5% relative F1 performance improvement over the SOTA on the complex multi-hop CWQ dataset. Our code is available at https://github.com/NUSTM/From-Phrases-to-Subgraphs.

MEMIT-Merge: Addressing MEMIT’s Key-Value Conflicts in Same-Subject Batch Editing for LLMs
Zilu Dong | Xiangqing Shen | Rui Xia
Findings of the Association for Computational Linguistics: ACL 2025

As large language models (LLMs) continue to scale up, knowledge editing techniques that modify models’ internal knowledge without full retraining have gained significant attention. MEMIT, a prominent batch editing algorithm, stands out for its capability to perform mass knowledge modifications. However, we uncovers a critical limitation that MEMIT’s editing efficacy significantly deteriorates when processing batches containing multiple edits sharing the same subject. Our analysis reveals the root cause lies in MEMIT’s key-value modeling framework: when multiple facts with the same subject in a batch are modeled through MEMIT’s key-value mechanism, identical keys (derived from the shared subject) are forced to represent different values (corresponding to distinct knowledge), resulting in update conflicts during editing. Addressing this issue, we propose MEMIT-Merge, an enhanced approach that merges value computation processes for facts sharing the same subject, effectively resolving the performance degradation in same-subject batch editing scenarios. Experimental results demonstrate that at a batch size of 5, while the original MEMIT’s success rate drops to 46%, MEMIT-Merge maintains a 98% editing success rate, showcasing remarkable robustness to subject entity collisions.

Flexible Thinking for Multimodal Emotional Support Conversation via Reinforcement Learning
Fanfan Wang | Xiangqing Shen | Jianfei Yu | Rui Xia
Findings of the Association for Computational Linguistics: EMNLP 2025

Emotional Support Conversation (ESC) systems aim to alleviate user distress. However, current Chain-of-Thought based ESC methods often employ rigid, text-only reasoning, limiting adaptability in dynamic, multimodal interactions and introducing reasoning noise that degrades support quality. To address this, we introduce “Flexible Thinking” for multimodal ESC, enabling models to adaptively select contextually relevant thinking aspects: Visual Scene, Emotion, Situation, and Response Strategy. We first construct training data by manually curating flexible thinking demonstrations on the MESC dataset, then using a Multimodal Large Language Model to synthesize these processes for the full training set. Then, we propose FIRES, a framework integrating Supervised Fine-Tuning (SFT) for initial learning with Reinforcement Learning for refinement. This two-stage approach helps FIRES transcend SFT’s generalization limits and, crucially, directly links thinking processes to response quality via tailored rewards, moving beyond imitating potentially imperfect synthetic data. Experiments on MESC and EMOTyDA datasets demonstrate FIRES’s effectiveness and generalizability in fostering higher-quality emotional support responses through adaptive reasoning.

2024

Ask Again, Then Fail: Large Language Models’ Vacillations in Judgment
Qiming Xie | Zengzhi Wang | Yi Feng | Rui Xia
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We observe that current large language models often waver in their judgments when faced with follow-up questions, even if the original judgment was correct. This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a Follow-up Questioning Mechanism along with two metrics to quantify this inconsistency, confirming its widespread presence in current large language models. Furthermore, to mitigate this issue, we explore various prompting strategies for closed-source models, and develop a training-based framework Unwavering-FQ that teaches large language models to maintain their originally correct judgments through synthesized high-quality preference data. Our experimental results confirm the effectiveness of our framework and its ability to enhance the general capabilities of large language models.

A Joint Coreference-Aware Approach to Document-Level Target Sentiment Analysis
Hongjie Cai | Heqing Ma | Jianfei Yu | Rui Xia
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Most existing work on aspect-based sentiment analysis (ABSA) focuses on the sentence level, while research at the document level has not received enough attention. Compared to sentence-level ABSA, the document-level ABSA is not only more practical but also requires holistic document-level understanding capabilities such as coreference resolution. To investigate the impact of coreference information on document-level ABSA, we conduct a three-stage research for the document-level target sentiment analysis (DTSA) task: 1) exploring the effectiveness of coreference information for the DTSA task; 2) reducing the reliance on manually annotated coreference information; 3) alleviating the evaluation bias caused by missing the coreference information of opinion targets. Specifically, we first manually annotate the coreferential opinion targets and propose a multi-task learning framework to jointly model the DTSA task and the coreference resolution task. Then we annotate the coreference information with ChatGPT for joint training. Finally, to address the issue of missing coreference targets, we modify the metrics from strict matching to a loose matching method based on the clusters of targets. The experimental results not only demonstrate the effectiveness of our framework but also reflect the feasibility of using ChatGPT-annotated coreferential entities and the applicability of the modified metrics. Our source code is publicly released at https://github.com/NUSTM/DTSA-Coref.

SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations
Fanfan Wang | Heqing Ma | Rui Xia | Jianfei Yu | Erik Cambria
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual’s emotional state in conversations, is of great importance in many application scenarios. We organize SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, which aims at extracting all pairs of emotions and their corresponding causes from conversations. Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE). The shared task has attracted 143 registrations and 216 successful submissions.In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.

2023

Commonsense Knowledge Graph Completion Via Contrastive Pretraining and Node Clustering
Siwei Wu | Xiangqing Shen | Rui Xia
Findings of the Association for Computational Linguistics: ACL 2023

The nodes in the commonsense knowledge graph (CSKG) are normally represented by free-form short text (e.g., word or phrase). Different nodes may represent the same concept. This leads to the problems of edge sparsity and node redundancy, which challenges CSKG representation and completion. On the one hand, edge sparsity limits the performance of graph representation learning; On the other hand, node redundancy makes different nodes corresponding to the same concept have inconsistent relations with other nodes. To address the two problems, we propose a new CSKG completion framework based on Contrastive Pretraining and Node Clustering (CPNC). Contrastive Pretraining constructs positive and negative head-tail node pairs on CSKG and utilizes contrastive learning to obtain better semantic node representation. Node Clustering aggregates nodes with the same concept into a latent concept, assisting the task of CSKG completion. We evaluate our CPNC approach on two CSKG completion benchmarks (CN-100K and ATOMIC), where CPNC outperforms the state-of-the-art methods. Extensive experiments demonstrate that both Contrastive Pretraining and Node Clustering can significantly improve the performance of CSKG completion. The source code of CPNC is publicly available on https://github.com/NUSTM/CPNC.

Generative Emotion Cause Triplet Extraction in Conversations with Commonsense Knowledge
Fanfan Wang | Jianfei Yu | Rui Xia
Findings of the Association for Computational Linguistics: EMNLP 2023

Emotion Cause Triplet Extraction in Conversations (ECTEC) aims to simultaneously extract emotion utterances, emotion categories, and cause utterances from conversations. However, existing studies mainly decompose the ECTEC task into multiple subtasks and solve them in a pipeline manner. Moreover, since conversations tend to contain many informal and implicit expressions, it often requires external knowledge and reasoning-based inference to accurately identify emotional and causal clues implicitly mentioned in the context, which are ignored by previous work. To address these limitations, in this paper, we propose a commonSense knowledge-enHanced generAtive fRameworK named SHARK, which formulates the ECTEC task as an index generation problem and generates the emotion-cause-category triplets in an end-to-end manner with a sequence-to-sequence model. Furthermore, we propose to incorporate both retrieved and generated commonsense knowledge into the generative model via a dual-view gate mechanism and a graph attention layer. Experimental results show that our SHARK model consistently outperforms several competitive systems on two benchmark datasets. Our source codes are publicly released at https://github.com/NUSTM/SHARK.

Grounded Multimodal Named Entity Recognition on Social Media
Jianfei Yu | Ziyan Li | Jieming Wang | Rui Xia
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In recent years, Multimodal Named Entity Recognition (MNER) on social media has attracted considerable attention. However, existing MNER studies only extract entity-type pairs in text, which is useless for multimodal knowledge graph construction and insufficient for entity disambiguation. To solve these issues, in this work, we introduce a Grounded Multimodal Named Entity Recognition (GMNER) task. Given a text-image social post, GMNER aims to identify the named entities in text, their entity types, and their bounding box groundings in image (i.e. visual regions). To tackle the GMNER task, we construct a Twitter dataset based on two existing MNER datasets. Moreover, we extend four well-known MNER methods to establish a number of baseline systems and further propose a Hierarchical Index generation framework named H-Index, which generates the entity-type-region triples in a hierarchical manner with a sequence-to-sequence model. Experiment results on our annotated dataset demonstrate the superiority of our H-Index framework over baseline systems on the GMNER task.

Emotion Cause Extraction on Social Media without Human Annotation
Debin Xiao | Rui Xia | Jianfei Yu
Findings of the Association for Computational Linguistics: ACL 2023

In social media, there is a vast amount of information pertaining to people’s emotions and the corresponding causes. The emotion cause extraction (ECE) from social media data is an important research area that has not been thoroughly explored due to the lack of fine-grained annotations. Early studies referred to either unsupervised rule-based methods or supervised machine learning methods using a number of manually annotated data in specific domains. However, the former suffers from limitations in extraction performance, while the latter is constrained by the availability of fine-grained annotations and struggles to generalize to diverse domains. To address these issues, this paper proposes a new ECE framework on Chinese social media that achieves high extraction performance and generalizability without relying on human annotation. Specifically, we design a more dedicated rule-based system based on constituency parsing tree to discover causal patterns in social media. This system enables us to acquire large amounts of fine-grained annotated data. Next, we train a neural model on the rule-annotated dataset with a specific training strategy to further improve the model’s generalizability. Extensive experiments demonstrate the superiority of our approach over other methods in unsupervised and weakly-supervised settings.

UniCOQE: Unified Comparative Opinion Quintuple Extraction As A Set
Zinong Yang | Feng Xu | Jianfei Yu | Rui Xia
Findings of the Association for Computational Linguistics: ACL 2023

Comparative Opinion Quintuple Extraction (COQE) aims to identify comparative opinion sentences in product reviews, extract comparative opinion elements in the sentences, and then incorporate them into quintuples. Existing methods decompose the COQE task into multiple primary subtasks and then solve them in a pipeline manner. However, these approaches ignore the intrinsic connection between subtasks and the error propagation among stages. This paper proposes a unified generative model, UniCOQE, to solve the COQE task in one shot. We design a generative template where all the comparative tuples are concatenated as the target output sequence. However, the multiple tuples are inherently not an ordered sequence but an unordered set. The pre-defined order will force the generative model to learn a false order bias and hinge the model’s training. To alleviate this bias, we introduce a new “predict-and-assign” training paradigm that models the golden tuples as a set. Specifically, we utilize a set-matching strategy to find the optimal order of tuples. The experimental results on multiple benchmarks show that our unified generative model significantly outperforms the SOTA method, and ablation experiments prove the effectiveness of the set-matching strategy.

A Facial Expression-Aware Multimodal Multi-task Learning Framework for Emotion Recognition in Multi-party Conversations
Wenjie Zheng | Jianfei Yu | Rui Xia | Shijin Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multimodal Emotion Recognition in Multiparty Conversations (MERMC) has recently attracted considerable attention. Due to the complexity of visual scenes in multi-party conversations, most previous MERMC studies mainly focus on text and audio modalities while ignoring visual information. Recently, several works proposed to extract face sequences as visual features and have shown the importance of visual information in MERMC. However, given an utterance, the face sequence extracted by previous methods may contain multiple people’s faces, which will inevitably introduce noise to the emotion prediction of the real speaker. To tackle this issue, we propose a two-stage framework named Facial expressionaware Multimodal Multi-Task learning (FacialMMT). Specifically, a pipeline method is first designed to extract the face sequence of the real speaker of each utterance, which consists of multimodal face recognition, unsupervised face clustering, and face matching. With the extracted face sequences, we propose a multimodal facial expression-aware emotion recognition model, which leverages the frame-level facial emotion distributions to help improve utterance-level emotion recognition based on multi-task learning. Experiments demonstrate the effectiveness of the proposed FacialMMT framework on the benchmark MELD dataset. The source code is publicly released at https://github.com/NUSTM/FacialMMT.

A Sequence-to-Structure Approach to Document-level Targeted Sentiment Analysis
Nan Song | Hongjie Cai | Rui Xia | Jianfei Yu | Zhen Wu | Xinyu Dai
Findings of the Association for Computational Linguistics: EMNLP 2023

Most previous studies on aspect-based sentiment analysis (ABSA) were carried out at the sentence level, while the research of document-level ABSA has not received enough attention. In this work, we focus on the document-level targeted sentiment analysis task, which aims to extract the opinion targets consisting of multi-level entities from a review document and predict their sentiments. We propose a Sequence-to-Structure (Seq2Struct) approach to address the task, which is able to explicitly model the hierarchical structure among multiple opinion targets in a document, and capture the long-distance dependencies among affiliated entities across sentences. In addition to the existing Seq2Seq approach, we further construct four strong baselines with different pretrained models. Experimental results on six domains show that our Seq2Struct approach outperforms all the baselines significantly. Aside from the performance advantage in outputting the multi-level target-sentiment pairs, our approach has another significant advantage - it can explicitly display the hierarchical structure of the opinion targets within a document. Our source code is publicly released at https://github.com/NUSTM/Doc-TSA-Seq2Struct.

Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge Coverage and Massive Multi-hop Paths
Xiangqing Shen | Siwei Wu | Rui Xia
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

ATOMIC is a large-scale commonsense knowledge graph (CSKG) containing everyday if-then knowledge triplets, i.e., head event, relation, tail event. The one-hop annotation manner made ATOMIC a set of independent bipartite graphs, which ignored the numerous links between events in different bipartite graphs and consequently caused shortages in knowledge coverage and multi-hop paths. In this work, we aim to construct Dense-ATOMIC with high knowledge coverage and massive multi-hop paths. The events in ATOMIC are normalized to a consistent pattern at first. We then propose a CSKG completion method called Rel-CSKGC to predict the relation given the head event and the tail event of a triplet, and train a CSKG completion model based on existing triplets in ATOMIC. We finally utilize the model to complete the missing links in ATOMIC and accordingly construct Dense-ATOMIC. Both automatic and human evaluation on an annotated subgraph of ATOMIC demonstrate the advantage of Rel-CSKGC over strong baselines. We further conduct extensive evaluations on Dense-ATOMIC in terms of statistics, human evaluation, and simple downstream tasks, all proving Dense-ATOMIC’s advantages in Knowledge Coverage and Multi-hop Paths. Both the source code of Rel-CSKGC and Dense-ATOMIC are publicly available on https://github.com/NUSTM/Dense-ATOMIC.

Cross-Domain Data Augmentation with Domain-Adaptive Language Modeling for Aspect-Based Sentiment Analysis
Jianfei Yu | Qiankun Zhao | Rui Xia
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cross-domain Aspect-Based Sentiment Analysis (ABSA) aims to leverage the useful knowledge from a source domain to identify aspect-sentiment pairs in sentences from a target domain. To tackle the task, several recent works explore a new unsupervised domain adaptation framework, i.e., Cross-Domain Data Augmentation (CDDA), aiming to directly generate much labeled target-domain data based on the labeled source-domain data. However, these CDDA methods still suffer from several issues: 1) preserving many source-specific attributes such as syntactic structures; 2) lack of fluency and coherence; 3) limiting the diversity of generated data. To address these issues, we propose a new cross-domain Data Augmentation approach based on Domain-Adaptive Language Modeling named DA²LM, which contains three stages: 1) assigning pseudo labels to unlabeled target-domain data; 2) unifying the process of token generation and labeling with a Domain-Adaptive Language Model (DALM) to learn the shared context and annotation across domains; 3) using the trained DALM to generate labeled target-domain data. Experiments show that DA²LM consistently outperforms previous feature adaptation and CDDA methods on both ABSA and Aspect Extraction tasks. The source code is publicly released at https://github.com/NUSTM/DALM.

2022

Generative Cross-Domain Data Augmentation for Aspect and Opinion Co-Extraction
Junjie Li | Jianfei Yu | Rui Xia
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

As a fundamental task in opinion mining, aspect and opinion co-extraction aims to identify the aspect terms and opinion terms in reviews. However, due to the lack of fine-grained annotated resources, it is hard to train a robust model for many domains. To alleviate this issue, unsupervised domain adaptation is proposed to transfer knowledge from a labeled source domain to an unlabeled target domain. In this paper, we propose a new Generative Cross-Domain Data Augmentation framework for unsupervised domain adaptation. The proposed framework is aimed to generate target-domain data with fine-grained annotation by exploiting the labeled data in the source domain. Specifically, we remove the domain-specific segments in a source-domain labeled sentence, and then use this as input to a pre-trained sequence-to-sequence model BART to simultaneously generate a target-domain sentence and predict the corresponding label for each word. Experimental results on three datasets demonstrate that our approach is more effective than previous domain adaptation methods.

Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis
Yan Ling | Jianfei Yu | Rui Xia
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As an important task in sentiment analysis, Multimodal Aspect-Based Sentiment Analysis (MABSA) has attracted increasing attention inrecent years. However, previous approaches either (i) use separately pre-trained visual and textual models, which ignore the crossmodalalignment or (ii) use vision-language models pre-trained with general pre-training tasks, which are inadequate to identify fine-grainedaspects, opinions, and their alignments across modalities. To tackle these limitations, we propose a task-specific Vision-LanguagePre-training framework for MABSA (VLP-MABSA), which is a unified multimodal encoder-decoder architecture for all the pretrainingand downstream tasks. We further design three types of task-specific pre-training tasks from the language, vision, and multimodalmodalities, respectively. Experimental results show that our approach generally outperforms the state-of-the-art approaches on three MABSA subtasks. Further analysis demonstrates the effectiveness of each pre-training task. The source code is publicly released at https://github.com/NUSTM/VLP-MABSA.

2021

Reinforced Counterfactual Data Augmentation for Dual Sentiment Classification
Hao Chen | Rui Xia | Jianfei Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Data augmentation and adversarial perturbation approaches have recently achieved promising results in solving the over-fitting problem in many natural language processing (NLP) tasks including sentiment classification. However, existing studies aimed to improve the generalization ability by augmenting the training data with synonymous examples or adding random noises to word embeddings, which cannot address the spurious association problem. In this work, we propose an end-to-end reinforcement learning framework, which jointly performs counterfactual data generation and dual sentiment classification. Our approach has three characteristics:1) the generator automatically generates massive and diverse antonymous sentences; 2) the discriminator contains a original-side sentiment predictor and an antonymous-side sentiment predictor, which jointly evaluate the quality of the generated sample and help the generator iteratively generate higher-quality antonymous samples; 3) the discriminator is directly used as the final sentiment classifier without the need to build an extra one. Extensive experiments show that our approach outperforms strong data augmentation baselines on several benchmark sentiment classification datasets. Further analysis confirms our approach’s advantages in generating more diverse training samples and solving the spurious association problem in sentiment classification.

Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions
Hongjie Cai | Rui Xia | Jianfei Yu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Product reviews contain a large number of implicit aspects and implicit opinions. However, most of the existing studies in aspect-based sentiment analysis ignored this problem. In this work, we introduce a new task, named Aspect-Category-Opinion-Sentiment (ACOS) Quadruple Extraction, with the goal to extract all aspect-category-opinion-sentiment quadruples in a review sentence and provide full support for aspect-based sentiment analysis with implicit aspects and opinions. We furthermore construct two new datasets, Restaurant-ACOS and Laptop-ACOS, for this new task, both of which contain the annotations of not only aspect-category-opinion-sentiment quadruples but also implicit aspects and opinions. The former is an extension of the SemEval Restaurant dataset; the latter is a newly collected and annotated Laptop dataset, twice the size of the SemEval Laptop dataset. We finally benchmark the task with four baseline systems. Experiments demonstrate the feasibility of the new task and its effectiveness in extracting and describing implicit aspects and implicit opinions. The two datasets and source code of four systems are publicly released at https://github.com/NUSTM/ACOS.

Cross-Domain Review Generation for Aspect-Based Sentiment Analysis
Jianfei Yu | Chenggong Gong | Rui Xia
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations
Heng Ji | Jong C. Park | Rui Xia
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

Comparative Opinion Quintuple Extraction from Product Reviews
Ziheng Liu | Rui Xia | Jianfei Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

As an important task in opinion mining, comparative opinion mining aims to identify comparative sentences from product reviews, extract the comparative elements, and obtain the corresponding comparative opinion tuples. However, most previous studies simply regarded comparative tuple extraction as comparative element extraction, but ignored the fact that many comparative sentences may contain multiple comparisons. The comparative opinion tuples defined in these studies also failed to explicitly provide comparative preferences. To address these limitations, in this work we first introduce a new Comparative Opinion Quintuple Extraction (COQE) task, to identify comparative sentences from product reviews and extract all comparative opinion quintuples (Subject, Object, Comparative Aspect, Comparative Opinion, Comparative Preference). Secondly, based on the existing comparative opinion mining corpora, we make supplementary annotations and construct three datasets for the COQE task. Finally, we benchmark the COQE task by proposing a new BERT-based multi-stage approach as well as three baseline systems extended from previous methods. %The new approach significantly outperforms three baseline systems on three datasets and represents a strong benchmark for COQE. Experimental results show that the new approach significantly outperforms three baseline systems on three datasets for the COQE task.

2020

Unified Feature and Instance Based Domain Adaptation for Aspect-Based Sentiment Analysis
Chenggong Gong | Jianfei Yu | Rui Xia
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The supervised models for aspect-based sentiment analysis (ABSA) rely heavily on labeled data. However, fine-grained labeled data are scarce for the ABSA task. To alleviate the dependence on labeled data, prior works mainly focused on feature-based adaptation, which used the domain-shared knowledge to construct auxiliary tasks or domain adversarial learning to bridge the gap between domains, while ignored the attribute of instance-based adaptation. To resolve this limitation, we propose an end-to-end framework to jointly perform feature and instance based adaptation for the ABSA task in this paper. Based on BERT, we learn domain-invariant feature representations by using part-of-speech features and syntactic dependency relations to construct auxiliary tasks, and jointly perform word-level instance weighting in the framework of sequence labeling. Experiment results on four benchmarks show that the proposed method can achieve significant improvements in comparison with the state-of-the-arts in both tasks of cross-domain End2End ABSA and cross-domain aspect extraction.

A State-independent and Time-evolving Network for Early Rumor Detection in Social Media
Rui Xia | Kaizhou Xuan | Jianfei Yu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we study automatic rumor detection for in social media at the event level where an event consists of a sequence of posts organized according to the posting time. It is common that the state of an event is dynamically evolving. However, most of the existing methods to this task ignored this problem, and established a global representation based on all the posts in the event’s life cycle. Such coarse-grained methods failed to capture the event’s unique features in different states. To address this limitation, we propose a state-independent and time-evolving Network (STN) for rumor detection based on fine-grained event state detection and segmentation. Given an event composed of a sequence of posts, STN first predicts the corresponding sequence of states and segments the event into several state-independent sub-events. For each sub-event, STN independently trains an encoder to learn the feature representation for that sub-event and incrementally fuses the representation of the current sub-event with previous ones for rumor prediction. This framework can more accurately learn the representation of an event in the initial stage and enable early rumor detection. Experiments on two benchmark datasets show that STN can significantly improve the rumor detection accuracy in comparison with some strong baseline systems. We also design a new evaluation metric to measure the performance of early rumor detection, under which STN shows a higher advantage in comparison.

Coupled Hierarchical Transformer for Stance-Aware Rumor Verification in Social Media Conversations
Jianfei Yu | Jing Jiang | Ling Min Serena Khoo | Hai Leong Chieu | Rui Xia
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The prevalent use of social media enables rapid spread of rumors on a massive scale, which leads to the emerging need of automatic rumor verification (RV). A number of previous studies focus on leveraging stance classification to enhance RV with multi-task learning (MTL) methods. However, most of these methods failed to employ pre-trained contextualized embeddings such as BERT, and did not exploit inter-task dependencies by using predicted stance labels to improve the RV task. Therefore, in this paper, to extend BERT to obtain thread representations, we first propose a Hierarchical Transformer, which divides each long thread into shorter subthreads, and employs BERT to separately represent each subthread, followed by a global Transformer layer to encode all the subthreads. We further propose a Coupled Transformer Module to capture the inter-task interactions and a Post-Level Attention layer to use the predicted stance labels for RV, respectively. Experiments on two benchmark datasets show the superiority of our Coupled Hierarchical Transformer model over existing MTL approaches.

End-to-End Emotion-Cause Pair Extraction based on Sliding Window Multi-Label Learning
Zixiang Ding | Rui Xia | Jianfei Yu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Emotion-cause pair extraction (ECPE) is a new task that aims to extract the potential pairs of emotions and their corresponding causes in a document. The existing methods first perform emotion extraction and cause extraction independently, and then perform emotion-cause pairing and filtering. However, the above methods ignore the fact that the cause and the emotion it triggers are inseparable, and the extraction of the cause without specifying the emotion is pathological, which greatly limits the performance of the above methods in the first step. To tackle these shortcomings, we propose two joint frameworks for ECPE: 1) multi-label learning for the extraction of the cause clauses corresponding to the specified emotion clause (CMLL) and 2) multi-label learning for the extraction of the emotion clauses corresponding to the specified cause clause (EMLL). The window of multi-label learning is centered on the specified emotion clause or cause clause and slides as their positions move. Finally, CMLL and EMLL are integrated to obtain the final result. We evaluate our model on a benchmark emotion cause corpus, the results show that our approach achieves the best performance among all compared systems on the ECPE task.

Aspect-Category based Sentiment Analysis with Hierarchical Graph Convolutional Network
Hongjie Cai | Yaofeng Tu | Xiangsheng Zhou | Jianfei Yu | Rui Xia
Proceedings of the 28th International Conference on Computational Linguistics

Most of the aspect based sentiment analysis research aims at identifying the sentiment polarities toward some explicit aspect terms while ignores implicit aspects in text. To capture both explicit and implicit aspects, we focus on aspect-category based sentiment analysis, which involves joint aspect category detection and category-oriented sentiment classification. However, currently only a few simple studies have focused on this problem. The shortcomings in the way they defined the task make their approaches difficult to effectively learn the inner-relations between categories and the inter-relations between categories and sentiments. In this work, we re-formalize the task as a category-sentiment hierarchy prediction problem, which contains a hierarchy output structure to first identify multiple aspect categories in a piece of text, and then predict the sentiment for each of the identified categories. Specifically, we propose a Hierarchical Graph Convolutional Network (Hier-GCN), where a lower-level GCN is to model the inner-relations among multiple categories, and the higher-level GCN is to capture the inter-relations between aspect categories and sentiments. Extensive evaluations demonstrate that our hierarchy output structure is superior over existing ones, and the Hier-GCN model can consistently achieve the best results on four benchmarks.

Grid Tagging Scheme for Aspect-oriented Fine-grained Opinion Extraction
Zhen Wu | Chengcan Ying | Fei Zhao | Zhifang Fan | Xinyu Dai | Rui Xia
Findings of the Association for Computational Linguistics: EMNLP 2020

Aspect-oriented Fine-grained Opinion Extraction (AFOE) aims at extracting aspect terms and opinion terms from review in the form of opinion pairs or additionally extracting sentiment polarity of aspect term to form opinion triplet. Because of containing several opinion factors, the complete AFOE task is usually divided into multiple subtasks and achieved in the pipeline. However, pipeline approaches easily suffer from error propagation and inconvenience in real-world scenarios. To this end, we propose a novel tagging scheme, Grid Tagging Scheme (GTS), to address the AFOE task in an end-to-end fashion only with one unified grid tagging task. Additionally, we design an effective inference strategy on GTS to exploit mutual indication between different opinion factors for more accurate extractions. To validate the feasibility and compatibility of GTS, we implement three different GTS models respectively based on CNN, BiLSTM, and BERT, and conduct experiments on the aspect-oriented opinion pair extraction and opinion triplet extraction datasets. Extensive experimental results indicate that GTS models outperform strong baselines significantly and achieve state-of-the-art performance.

ECPE-2D: Emotion-Cause Pair Extraction based on Joint Two-Dimensional Representation, Interaction and Prediction
Zixiang Ding | Rui Xia | Jianfei Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In recent years, a new interesting task, called emotion-cause pair extraction (ECPE), has emerged in the area of text emotion analysis. It aims at extracting the potential pairs of emotions and their corresponding causes in a document. To solve this task, the existing research employed a two-step framework, which first extracts individual emotion set and cause set, and then pair the corresponding emotions and causes. However, such a pipeline of two steps contains some inherent flaws: 1) the modeling does not aim at extracting the final emotion-cause pair directly; 2) the errors from the first step will affect the performance of the second step. To address these shortcomings, in this paper we propose a new end-to-end approach, called ECPE-Two-Dimensional (ECPE-2D), to represent the emotion-cause pairs by a 2D representation scheme. A 2D transformer module and two variants, window-constrained and cross-road 2D transformers, are further proposed to model the interactions of different emotion-cause pairs. The 2D representation, interaction, and prediction are integrated into a joint framework. In addition to the advantages of joint modeling, the experimental results on the benchmark emotion cause corpus show that our approach improves the F1 score of the state-of-the-art from 61.28% to 68.89%.

Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer
Jianfei Yu | Jing Jiang | Li Yang | Rui Xia
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we study Multimodal Named Entity Recognition (MNER) for social media posts. Existing approaches for MNER mainly suffer from two drawbacks: (1) despite generating word-aware visual representations, their word representations are insensitive to the visual context; (2) most of them ignore the bias brought by the visual context. To tackle the first issue, we propose a multimodal interaction module to obtain both image-aware word representations and word-aware visual representations. To alleviate the visual bias, we further propose to leverage purely text-based entity span detection as an auxiliary module, and design a Unified Multimodal Transformer to guide the final predictions with the entity span predictions. Experiments show that our unified approach achieves the new state-of-the-art performance on two benchmark datasets.

2019

Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts
Rui Xia | Zixiang Ding
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Emotion cause extraction (ECE), the task aimed at extracting the potential causes behind certain emotions in text, has gained much attention in recent years due to its wide applications. However, it suffers from two shortcomings: 1) the emotion must be annotated before cause extraction in ECE, which greatly limits its applications in real-world scenarios; 2) the way to first annotate emotion and then extract the cause ignores the fact that they are mutually indicative. In this work, we propose a new task: emotion-cause pair extraction (ECPE), which aims to extract the potential pairs of emotions and corresponding causes in a document. We propose a 2-step approach to address this new ECPE task, which first performs individual emotion extraction and cause extraction via multi-task learning, and then conduct emotion-cause pairing and filtering. The experimental results on a benchmark emotion cause corpus prove the feasibility of the ECPE task as well as the effectiveness of our approach.

2018

Multimodal Relational Tensor Network for Sentiment and Emotion Classification
Saurav Sahay | Shachi H Kumar | Rui Xia | Jonathan Huang | Lama Nachman
Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML)

Understanding Affect from video segments has brought researchers from the language, audio and video domains together. Most of the current multimodal research in this area deals with various techniques to fuse the modalities, and mostly treat the segments of a video independently. Motivated by the work of (Zadeh et al., 2017) and (Poria et al., 2017), we present our architecture, Relational Tensor Network, where we use the inter-modal interactions within a segment (intra-segment) and also consider the sequence of segments in a video to model the inter-segment inter-modal interactions. We also generate rich representations of text and audio modalities by leveraging richer audio and linguistic context alongwith fusing fine-grained knowledge based polarity scores from text. We present the results of our model on CMU-MOSEI dataset and show that our model outperforms many baselines and state of the art methods for sentiment classification and emotion recognition.

2017

Sentiment Lexicon Construction with Representation Learning Based on Hierarchical Sentiment Supervision
Leyi Wang | Rui Xia
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Sentiment lexicon is an important tool for identifying the sentiment polarity of words and texts. How to automatically construct sentiment lexicons has become a research topic in the field of sentiment analysis and opinion mining. Recently there were some attempts to employ representation learning algorithms to construct a sentiment lexicon with sentiment-aware word embedding. However, these methods were normally trained under document-level sentiment supervision. In this paper, we develop a neural architecture to train a sentiment-aware word embedding by integrating the sentiment supervision at both document and word levels, to enhance the quality of word embedding as well as the sentiment lexicon. Experiments on the SemEval 2013-2016 datasets indicate that the sentiment lexicon generated by our approach achieves the state-of-the-art performance in both supervised and unsupervised sentiment classification, in comparison with several strong sentiment lexicon construction methods.

2015

Co-training for Semi-supervised Sentiment Classification Based on Dual-view Bags-of-words Representation
Rui Xia | Cheng Wang | Xin-Yu Dai | Tao Li
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

Bilingual Event Extraction: a Case Study on Trigger Type Determination
Zhu Zhu | Shoushan Li | Guodong Zhou | Rui Xia
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

Dual Training and Dual Prediction for Polarity Classification
Rui Xia | Tao Wang | Xuelei Hu | Shoushan Li | Chengqing Zong
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

A POS-based Ensemble Model for Cross-domain Sentiment Classification
Rui Xia | Chengqing Zong
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

Exploring the Use of Word Relation Features for Sentiment Classification
Rui Xia | Chengqing Zong
Coling 2010: Posters

2009

A Framework of Feature Selection Methods for Text Categorization
Shoushan Li | Rui Xia | Chengqing Zong | Chu-Ren Huang
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

Co-authors

Shoushan Li (李寿山) 3

Chenggong Gong 2

Hai Leong Chieu 1

Yi Feng (冯毅) 1

Shachi H. Kumar 1

Chu-Ren Huang 1

Jonathan Huang 1

Ling Min Serena Khoo 1

Chengcan Ying 1

Xiangsheng Zhou 1

Guodong Zhou (周国栋) 1

Venues