Xiaojie Yuan - ACL Anthology

Xiaojie Yuan

2025

TaxoPro: A Plug-In LoRA-based Cross-Domain Method for Low-Resource Taxonomy Completion
Hongyuan Xu | Yuhang Niu | Ciyi Liu | Yanlong Wen | Xiaojie Yuan
Transactions of the Association for Computational Linguistics, Volume 13

Low-resource taxonomy completion aims to automatically insert new concepts into the existing taxonomy, in which only a few in-domain training samples are available. Recent studies have achieved considerable progress by incorporating prior knowledge from pre-trained language models (PLMs). However, these studies tend to overly rely on such knowledge and neglect the shareable knowledge across different taxonomies. In this paper, we propose TaxoPro, a plug-in LoRA-based cross-domain method, that captures shareable knowledge from the high- resource taxonomy to improve PLM-based low-resource taxonomy completion techniques. To prevent negative interference between domain-specific and domain-shared knowledge, TaxoPro decomposes cross- domain knowledge into domain-shared and domain-specific components, storing them using low-rank matrices (LoRA). Additionally, TaxoPro employs two auxiliary losses to regulate the flow of shareable knowledge. Experimental results demonstrate that TaxoPro improves PLM-based techniques, achieving state-of-the-art performance in completing low-resource taxonomies. Code is available at https://github.com/cyclexu/TaxoPro.

FGDGNN: Fine-Grained Dynamic Graph Neural Network for Rumor Detection on Social Media
Mei Guo | Chen Chen | Chunyan Hou | Yike Wu | Xiaojie Yuan
Findings of the Association for Computational Linguistics: ACL 2025

Detecting rumors on social media has become a crucial issue.Propagation structure-based methods have recently attracted increasing attention.When the propagation structure is represented by the dynamic graph, temporal information is considered.However, existing rumor detection models using dynamic graph typically focus only on coarse-grained temporal information and ignore the fine-grained temporal dynamics within individual snapshots and across snapshots.In this paper, we propose a novel Fine-Grained Dynamic Graph Neural Network (FGDGNN) model, which can incorporate the fine-grained temporal information of dynamic propagation graph in the intra-snapshot and dynamic embedding update mechanism in the inter-snapshots into a unified framework for rumor detection.Specifically, we first construct the edge-weighted propagation graph and the edge-aware graph isomorphism network is proposed.To obtain fine-grained temporal representations across snapshots, we propose an embedding transformation layer to update node embeddings.Finally, we integrate the temporal information in the inter-snapshots at the graph level to enhance the effectiveness of the proposed model.Extensive experiments conducted on three public real-world datasets demonstrate that our FGDGNN model achieves significant improvements compared with the state-of-the-art baselines.

SWAM: Adaptive Sliding Window and Memory-Augmented Attention Model for Rumor Detection
Mei Guo | Chen Chen | Chunyan Hou | Yike Wu | Xiaojie Yuan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Detecting rumors on social media has become a critical task in combating misinformation. Existing propagation-based rumor detection methods often focus on the static propagation graph, overlooking that rumor propagation is inherently dynamic and incremental in the real world. Recently propagation-based rumor detection models attempt to use the dynamic graph that is associated with coarse-grained temporal information. However, these methods fail to capture the long-term time dependency and detailed temporal features of propagation. To address these issues, we propose a novel adaptive Sliding Window and memory-augmented Attention Model (SWAM) for rumor detection. The adaptive sliding window divides the sequence of posts into consecutive disjoint windows based on the propagation rate of nodes. We also propose a memory-augmented attention to capture the long-term dependency and the depth of nodes in the propagation graph. Multi-head attention mechanism is applied between nodes in the memorybank and incremental nodes to iteratively update the memorybank, and the depth information of nodes is also considered. Finally, the propagation features of nodes in the memorybank are utilized for rumor detection. Experimental results on two public real-world datasets demonstrate the effectiveness of our model compared with the state-of-the-art baselines.

InstructGEC: Enhancing Unsupervised Grammatical Error Correction with Instruction Tuning
Jiayi Deng | Chen Chen | Chunyan Hou | Xiaojie Yuan
Proceedings of the 31st International Conference on Computational Linguistics

Recent works have proposed methods of generating synthetic data automatically for unsupervised Grammatical Error Correction (GEC). Although a large amount of synthetic data is generated at a low cost, it is unrealistic and of poor quality. The copying phenomenon of synthetic data prevents GEC models from learning the semantic knowledge of contextual language. In this paper, we design an instruction format and use the masking strategy in both an erroneous sentence and the corresponding instruction consistently to alleviate the impact of the copy phenomenon. We also propose a novel approach, InstructGEC, which integrates the knowledge of grammatical detection into GEC models with instruction tuning to address the low-quality issue. Experiments are conducted on English and Chinese GEC datasets and results demonstrate that our method outperforms state-of-the-art unsupervised GEC methods.

DTDES-KGE: Dual-Teacher Knowledge Distillation with Distinct Embedding Spaces for Knowledge Graph Embeddings
Bofan Wei | Hongyuan Xu | Yuhang Niu | Jiarui Ren | Yanlong Wen | Xiaojie Yuan
Findings of the Association for Computational Linguistics: EMNLP 2025

Knowledge distillation for knowledge graph embedding (KGE) models effectively compresses KGE models by reducing their embedding dimensions. While existing methods distill knowledge from a high-dimensional teacher to a low-dimensional student, they typically rely on a single teacher embedding space, thereby overlooking valuable complementary knowledge from teachers in distinct embedding spaces. This paper introduces DTDES-KGE, a novel knowledge distillation framework that significantly enhances distillation performance by leveraging dual teachers in distinct embedding spaces. To overcome the challenge of spatial heterogeneity when integrating knowledge from dual teachers, we propose a spatial compatibility module for reconciliation. Additionally, we introduce a student-aware knowledge fusion mechanism to fuse the knowledge from dual teachers dynamically. Extensive experiments on two real-world datasets validate the effectiveness of DTDES-KGE.

SafeInt: Shielding Large Language Models from Jailbreak Attacks via Safety-Aware Representation Intervention
Jiaqi Wu | Chen Chen | Chunyan Hou | Xiaojie Yuan
Findings of the Association for Computational Linguistics: EMNLP 2025

With the widespread real-world deployment of large language models (LLMs), ensuring their behavior complies with safety standards has become crucial. Jailbreak attacks exploit vulnerabilities in LLMs to induce undesirable behavior, posing a significant threat to LLM safety. Previous defenses often fail to achieve both effectiveness and efficiency simultaneously. Defenses from a representation perspective offer new insights, but existing interventions cannot dynamically adjust representations based on the harmfulness of the queries. To address this limitation, we propose SafeIntervention (SafeInt), a novel defense method that shields LLMs from jailbreak attacks through safety-aware representation intervention. Built on our analysis of the representations of jailbreak samples, the core idea of SafeInt is to relocate jailbreak-related representations into the rejection region. This is achieved by intervening in the representation distributions of jailbreak samples to align them with those of unsafe samples. We conduct comprehensive experiments covering six jailbreak attacks, two jailbreak datasets, and two utility benchmarks. Experimental results demonstrate that SafeInt outperforms all baselines in defending LLMs against jailbreak attacks while largely maintaining utility. Additionally, we evaluate SafeInt against adaptive attacks and verify its effectiveness in mitigating real-time attacks.

2024

Look before You Leap: Dual Logical Verification for Knowledge-based Visual Question Generation
Xumeng Liu | Wenya Guo | Ying Zhang | Xubo Liu | Yu Zhao | Shenglong Yu | Xiaojie Yuan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Knowledge-based Visual Question Generation aims to generate visual questions with outside knowledge other than the image. Existing approaches are answer-aware, which incorporate answers into the question-generation process. However, these methods just focus on leveraging the semantics of inputs to propose questions, ignoring the logical coherence among generated questions (Q), images (V), answers (A), and corresponding acquired outside knowledge (K). It results in generating many non-expected questions with low quality, lacking insight and diversity, and some of them are even without any corresponding answer. To address this issue, we inject logical verification into the processes of knowledge acquisition and question generation, which is defined as LVˆ2-Net. Through checking the logical structure among V, A, K, ground-truth and generated Q twice in the whole KB-VQG procedure, LVˆ2-Net can propose diverse and insightful knowledge-based visual questions. And experimental results on two commonly used datasets demonstrate the superiority of LVˆ2-Net. Our code will be released to the public soon.

MCIL: Multimodal Counterfactual Instance Learning for Low-resource Entity-based Multimodal Information Extraction
Baohang Zhou | Ying Zhang | Kehui Song | Hongru Wang | Yu Zhao | Xuhui Sui | Xiaojie Yuan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Multimodal information extraction (MIE) is a challenging task which aims to extract the structural information in free text coupled with the image for constructing the multimodal knowledge graph. The entity-based MIE tasks are based on the entity information to complete the specific tasks. However, the existing methods only investigated the entity-based MIE tasks under supervised learning with adequate labeled data. In the real-world scenario, collecting enough data and annotating the entity-based samples are time-consuming, and impractical. Therefore, we propose to investigate the entity-based MIE tasks under the low-resource settings. The conventional models are prone to overfitting on limited labeled data, which can result in poor performance. This is because the models tend to learn the bias existing in the limited samples, which can lead them to model the spurious correlations between multimodal features and task labels. To provide a more comprehensive understanding of the bias inherent in multimodal features of MIE samples, we decompose the features into image, entity, and context factors. Furthermore, we investigate the causal relationships between these factors and model performance, leveraging the structural causal model to delve into the correlations between the input features and output labels. Based on this, we propose the multimodal counterfactual instance learning framework to generate the counterfactual instances by the interventions on the limited observational samples. In the framework, we analyze the causal effect of the counterfactual instances and exploit it as a supervisory signal to maximize the effect for reducing the bias and improving the generalization of the model. Empirically, we evaluate the proposed method on the two public MIE benchmark datasets and the experimental results verify the effectiveness of it.

MELOV: Multimodal Entity Linking with Optimized Visual Features in Latent Space
Xuhui Sui | Ying Zhang | Yu Zhao | Kehui Song | Baohang Zhou | Xiaojie Yuan
Findings of the Association for Computational Linguistics: ACL 2024

Multimodal entity linking (MEL), which aligns ambiguous mentions within multimodal contexts to referent entities from multimodal knowledge bases, is essential for many natural language processing applications. Previous MEL methods mainly focus on exploring complex multimodal interaction mechanisms to better capture coherence evidence between mentions and entities by mining complementary information. However, in real-world social media scenarios, vision modality often exhibits low quality, low value, or low relevance to the mention. Integrating such information directly will backfire, leading to a weakened consistency between mentions and their corresponding entities. In this paper, we propose a novel latent space vision feature optimization framework MELOV, which combines inter-modality and intra-modality optimizations to address these challenges. For the inter-modality optimization, we exploit the variational autoencoder to mine shared information and generate text-based visual features. For the intra-modality optimization, we consider the relationships between mentions and build graph convolutional network to aggregate the visual features of semantic similar neighbors. Extensive experiments on three benchmark datasets demonstrate the superiority of our proposed framework.

Bring Invariant to Variant: A Contrastive Prompt-based Framework for Temporal Knowledge Graph Forecasting
Ying Zhang | Xinying Qian | Yu Zhao | Baohang Zhou | Kehui Song | Xiaojie Yuan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Temporal knowledge graph forecasting aims to reason over known facts to complete the missing links in the future. Existing methods are highly dependent on the structures of temporal knowledge graphs and commonly utilize recurrent or graph neural networks for forecasting. However, entities that are infrequently observed or have not been seen recently face challenges in learning effective knowledge representations due to insufficient structural contexts. To address the above disadvantages, in this paper, we propose a Contrastive Prompt-based framework with Entity background information for TKG forecasting, which we named CoPET. Specifically, to bring the time-invariant entity background information to time-variant structural information, we employ a dual encoder architecture consisting of a candidate encoder and a query encoder. A contrastive learning framework is used to encourage the query representation to be closer to the candidate representation. We further propose three kinds of trainable time-variant prompts aimed at capturing temporal structural information. Experiments on two datasets demonstrate that our method is effective and stays competitive in inference with limited structural information. Our code is available at https://github.com/qianxinying/CoPET.

2023

Selecting Key Views for Zero-Shot Entity Linking
Xuhui Sui | Ying Zhang | Kehui Song | Baohang Zhou | Xiaojie Yuan | Wensheng Zhang
Findings of the Association for Computational Linguistics: EMNLP 2023

Entity linking, which aligns mentions in the text to entities in knowledge bases, is essential for many natural language processing tasks. Considering the real-world scenarios, recent research hotspot of entity linking has focused on the zero-shot setting, where mentions need to link to unseen entities and only the description of each entity is provided. This task challenges the language understanding ability of models to capture the coherence evidence between the mention context and entity description. However, entity descriptions often contain rich information from multiple views, and a mention with context only relates to a small part of the information. Other irrelevant information will introduce noise, which interferes with models to make the right judgments. Furthermore, the existence of these information also makes it difficult to synthesize key information. To solve these problems, we select key views from descriptions and propose a KVZEL framework for zero-shot entity linking. Specifically, our KVZEL first adopts unsupervised clustering to form sub views. Then, it employs a mention-aware key views selection module to iteratively accumulate mention-focused views. This puts emphasis on capturing mention-related information and allows long-range key information integration. Finally, we aggregate key views to make the final decision. Experimental results show the effectiveness of our KVZEL and it achieves the new state-of-the-art on the zero-shot entity linking dataset.

BioFEG: Generate Latent Features for Biomedical Entity Linking
Xuhui Sui | Ying Zhang | Xiangrui Cai | Kehui Song | Baohang Zhou | Xiaojie Yuan | Wensheng Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Biomedical entity linking is an essential task in biomedical text processing, which aims to map entity mentions in biomedical text, such as clinical notes, to standard terms in a given knowledge base. However, this task is challenging due to the rarity of many biomedical entities in real-world scenarios, which often leads to a lack of annotated data for them. Limited by understanding these unseen entities, traditional biomedical entity linking models suffer from multiple types of linking errors. In this paper, we propose a novel latent feature generation framework BioFEG to address these challenges. Specifically, our BioFEG leverages domain knowledge to train a generative adversarial network, which generates latent semantic features of corresponding mentions for unseen entities. Utilizing these features, we fine-tune our entity encoder to capture fine-grained coherence information of unseen entities and better understand them. This allows models to make linking decisions more accurately, particularly for ambiguous mentions involving rare entities. Extensive experiments on the two benchmark datasets demonstrate the superiority of our proposed framework.

TacoPrompt: A Collaborative Multi-Task Prompt Learning Method for Self-Supervised Taxonomy Completion
Hongyuan Xu | Ciyi Liu | Yuhang Niu | Yunong Chen | Xiangrui Cai | Yanlong Wen | Xiaojie Yuan
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Automatic taxonomy completion aims to attach the emerging concept to an appropriate pair of hypernym and hyponym in the existing taxonomy. Existing methods suffer from the overfitting to leaf-only problem caused by imbalanced leaf and non-leaf samples when training the newly initialized classification head. Besides, they only leverage subtasks, namely attaching the concept to its hypernym or hyponym, as auxiliary supervision for representation learning yet neglect the effects of subtask results on the final prediction. To address the aforementioned limitations, we propose TacoPrompt, a Collaborative Multi-Task Prompt Learning Method for Self-Supervised Taxonomy Completion. First, we perform triplet semantic matching using the prompt learning paradigm to effectively learn non-leaf attachment ability from imbalanced training samples. Second, we design the result context to relate the final prediction to the subtask results by a contextual approach, enhancing prompt-based multi-task learning. Third, we leverage a two-stage retrieval and re-ranking approach to improve the inference efficiency. Experimental results on three datasets show that TacoPrompt achieves state-of-the-art taxonomy completion performance. Codes are available at https://github.com/cyclexu/TacoPrompt.

Incorporating Object-Level Visual Context for Multimodal Fine-Grained Entity Typing
Ying Zhang | Wenbo Fan | Kehui Song | Yu Zhao | Xuhui Sui | Xiaojie Yuan
Findings of the Association for Computational Linguistics: EMNLP 2023

Fine-grained entity typing (FGET) aims to assign appropriate fine-grained types to entity mentions within their context, which is an important foundational task in natural language processing. Previous approaches for FGET only utilized textual context information. However, in the form of short text, the contextual semantic information is often insufficient for FGET. In many real-world scenarios, text is often accompanied by images, and the visual context is valuable for FGET. To this end, we firstly propose a new task called multimodal fine-grained entity typing (MFGET). Then we construct a large-scale dataset for multimodal fine-grained entity typing called MFIGER based on FIGER. To fully leverage both textual and visual information, we propose a novel Multimodal Object-Level Visual Context Network (MOVCNet). MOVCNet can capture fine-grained semantic information by detecting objects in images, and effectively merge both textual and visual context. Experimental results demonstrate that our approach achieves superior classification performance compared to previous text-based approaches.

KAPALM: Knowledge grAPh enhAnced Language Models for Fake News Detection
Jing Ma | Chen Chen | Chunyan Hou | Xiaojie Yuan
Findings of the Association for Computational Linguistics: EMNLP 2023

Social media has not only facilitated news consumption, but also led to the wide spread of fake news. Because news articles in social media is usually condensed and full of knowledge entities, existing methods of fake news detection use external entity knowledge. However, majority of these methods focus on news entity information and ignore the structured knowledge among news entities. To address this issue, in this work, we propose a Knowledge grAPh enhAnced Language Model (KAPALM) which is a novel model that fuses coarse- and fine-grained representations of entity knowledge from Knowledge Graphs (KGs). Firstly, we identify entities in news content and link them to entities in KGs. Then, a subgraph of KGs is extracted to provide structured knowledge of entities in KGs and fed into a graph neural network to obtain the coarse-grained knowledge representation. This subgraph is pruned to provide fine-grained knowledge and fed into the attentive graph and graph pooling layer. Finally, we integrate the coarse- and fine-grained entity knowledge representations with the textual representation for fake news detection. The experimental results on two benchmark datasets show that our method is superior to state-of-the-art baselines. In addition, it is competitive in the few-shot scenario.

Licon: A Diverse, Controllable and Challenging Linguistic Concept Learning Benchmark
Shenglong Yu | Ying Zhang | Wenya Guo | Zhengkun Zhang | Ru Zhou | Xiaojie Yuan
Findings of the Association for Computational Linguistics: EMNLP 2023

Concept Learning requires learning the definition of a general category from given training examples. Most of the existing methods focus on learning concepts from images. However, the visual information cannot present abstract concepts exactly, which struggles the introduction of novel concepts related to known concepts (e.g., ‘Plant’→‘Asteroids’). In this paper, inspired by the fact that humans learn most concepts through linguistic description, we introduce Linguistic Concept Learning benchmark (Licon), where concepts in diverse forms (e.g., plain attributes, images, and text) are defined by linguistic descriptions. The difficulty to learn novel concepts can be controlled by the number of attributes or the hierarchical relationships between concepts. The diverse and controllable concepts are used to support challenging evaluation tasks, including concept classification, attribute prediction, and concept relationship recognition. In addition, we design an entailment-based concept learning method (EnC) to model the relationship among concepts. Extensive experiments demonstrate the effectiveness of EnC. The benchmark will be released to the public soon.

From Alignment to Entailment: A Unified Textual Entailment Framework for Entity Alignment
Yu Zhao | Yike Wu | Xiangrui Cai | Ying Zhang | Haiwei Zhang | Xiaojie Yuan
Findings of the Association for Computational Linguistics: ACL 2023

Entity Alignment (EA) aims to find the equivalent entities between two Knowledge Graphs (KGs). Existing methods usually encode the triples of entities as embeddings and learn to align the embeddings, which prevents the direct interaction between the original information of the cross-KG entities. Moreover, they encode the relational triples and attribute triples of an entity in heterogeneous embedding spaces, which prevents them from helping each other. In this paper, we transform both triples into unified textual sequences, and model the EA task as a bi-directional textual entailment task between the sequences of cross-KG entities. Specifically, we feed the sequences of two entities simultaneously into a pre-trained language model (PLM) and propose two kinds of PLM-based entity aligners that model the entailment probability between sequences as the similarity between entities. Our approach captures the unified correlation pattern of two kinds of information between entities, and explicitly models the fine-grained interaction between original entity information. The experiments on five cross-lingual EA datasets show that our approach outperforms the state-of-the-art EA methods and enables the mutual enhancement of the heterogeneous information. Codes are available at https://github.com/OreOZhao/TEA.

AoM: Detecting Aspect-oriented Information for Multimodal Aspect-Based Sentiment Analysis
Ru Zhou | Wenya Guo | Xumeng Liu | Shenglong Yu | Ying Zhang | Xiaojie Yuan
Findings of the Association for Computational Linguistics: ACL 2023

Multimodal aspect-based sentiment analysis (MABSA) aims to extract aspects from text-image pairs and recognize their sentiments. Existing methods make great efforts to align the whole image to corresponding aspects. However, different regions of the image may relate to different aspects in the same sentence, and coarsely establishing image-aspect alignment will introduce noise to aspect-based sentiment analysis (i.e., visual noise). Besides, the sentiment of a specific aspect can also be interfered by descriptions of other aspects (i.e., textual noise). Considering the aforementioned noises, this paper proposes an Aspect-oriented Method (AoM) to detect aspect-relevant semantic and sentiment information. Specifically, an aspect-aware attention module is designed to simultaneously select textual tokens and image blocks that are semantically related to the aspects. To accurately aggregate sentiment information, we explicitly introduce sentiment embedding into AoM, and use a graph convolutional network to model the vision-text and text-text interaction. Extensive experiments demonstrate the superiority of AoM to existing methods.

2022

Improving Zero-Shot Entity Linking Candidate Generation with Ultra-Fine Entity Type Information
Xuhui Sui | Ying Zhang | Kehui Song | Baohang Zhou | Guoqing Zhao | Xin Wei | Xiaojie Yuan
Proceedings of the 29th International Conference on Computational Linguistics

Entity linking, which aims at aligning ambiguous entity mentions to their referent entities in a knowledge base, plays a key role in multiple natural language processing tasks. Recently, zero-shot entity linking task has become a research hotspot, which links mentions to unseen entities to challenge the generalization ability. For this task, the training set and test set are from different domains, and thus entity linking models tend to be overfitting due to the tendency of memorizing the properties of entities that appear frequently in the training set. We argue that general ultra-fine-grained type information can help the linking models to learn contextual commonality and improve their generalization ability to tackle the overfitting problem. However, in the zero-shot entity linking setting, any type information is not available and entities are only identified by textual descriptions. Thus, we first extract the ultra-fine entity type information from the entity textual descriptions. Then, we propose a hierarchical multi-task model to improve the high-level zero-shot entity linking candidate generation task by utilizing the entity typing task as an auxiliary low-level task, which introduces extracted ultra-fine type information into the candidate generation task. Experimental results demonstrate the effectiveness of utilizing the ultra-fine entity type information and our proposed method achieves state-of-the-art performance.

Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances
Yike Wu | Yu Zhao | Shiwan Zhao | Ying Zhang | Xiaojie Yuan | Guoqing Zhao | Ning Jiang
Proceedings of the 29th International Conference on Computational Linguistics

Despite the great progress of Visual Question Answering (VQA), current VQA models heavily rely on the superficial correlation between the question type and its corresponding frequent answers (i.e., language priors) to make predictions, without really understanding the input. In this work, we define the training instances with the same question type but different answers as superficially similar instances, and attribute the language priors to the confusion of VQA model on such instances. To solve this problem, we propose a novel training framework that explicitly encourages the VQA model to distinguish between the superficially similar instances. Specifically, for each training instance, we first construct a set that contains its superficially similar counterparts. Then we exploit the proposed distinguishing module to increase the distance between the instance and its counterparts in the answer space. In this way, the VQA model is forced to further focus on the other parts of the input beyond the question type, which helps to overcome the language priors. Experimental results show that our method achieves the state-of-the-art performance on VQA-CP v2. Codes are available at Distinguishing-VQA.

A Span-based Multimodal Variational Autoencoder for Semi-supervised Multimodal Named Entity Recognition
Baohang Zhou | Ying Zhang | Kehui Song | Wenya Guo | Guoqing Zhao | Hongbin Wang | Xiaojie Yuan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Multimodal named entity recognition (MNER) on social media is a challenging task which aims to extract named entities in free text and incorporate images to classify them into user-defined types. However, the annotation for named entities on social media demands a mount of human efforts. The existing semi-supervised named entity recognition methods focus on the text modal and are utilized to reduce labeling costs in traditional NER. However, the previous methods are not efficient for semi-supervised MNER. Because the MNER task is defined to combine the text information with image one and needs to consider the mismatch between the posted text and image. To fuse the text and image features for MNER effectively under semi-supervised setting, we propose a novel span-based multimodal variational autoencoder (SMVAE) model for semi-supervised MNER. The proposed method exploits modal-specific VAEs to model text and image latent features, and utilizes product-of-experts to acquire multimodal features. In our approach, the implicit relations between labels and multimodal features are modeled by multimodal VAE. Thus, the useful information of unlabeled data can be exploited in our method under semi-supervised setting. Experimental results on two benchmark datasets demonstrate that our approach not only outperforms baselines under supervised setting, but also improves MNER performance with less labeled data than existing semi-supervised methods.

PM²F²N: Patient Multi-view Multi-modal Feature Fusion Networks for Clinical Outcome Prediction
Ying Zhang | Baohang Zhou | Kehui Song | Xuhui Sui | Guoqing Zhao | Ning Jiang | Xiaojie Yuan
Findings of the Association for Computational Linguistics: EMNLP 2022

Clinical outcome prediction is critical to the condition prediction of patients and management of hospital capacities. There are two kinds of medical data, including time series signals recorded by various devices and clinical notes in electronic health records (EHR), which are used for two common prediction targets: mortality and length of stay. Traditional methods focused on utilizing time series data but ignored clinical notes. With the development of deep learning, natural language processing (NLP) and multi-modal learning methods are exploited to jointly model the time series and clinical notes with different modals. However, the existing methods failed to fuse the multi-modal features of patients from different views. Therefore, we propose the patient multi-view multi-modal feature fusion networks for clinical outcome prediction. Firstly, from patient inner view, we propose to utilize the co-attention module to enhance the fine-grained feature interaction between time series and clinical notes from each patient. Secondly, the patient outer view is the correlation between patients, which can be reflected by the structural knowledge in clinical notes. We exploit the structural information extracted from clinical notes to construct the patient correlation graph, and fuse patients’ multi-modal features by graph neural networks (GNN). The experimental results on MIMIC-III benchmark demonstrate the superiority of our method.

TreeMAN: Tree-enhanced Multimodal Attention Network for ICD Coding
Zichen Liu | Xuyuan Liu | Yanlong Wen | Guoqing Zhao | Fen Xia | Xiaojie Yuan
Proceedings of the 29th International Conference on Computational Linguistics

ICD coding is designed to assign the disease codes to electronic health records (EHRs) upon discharge, which is crucial for billing and clinical statistics. In an attempt to improve the effectiveness and efficiency of manual coding, many methods have been proposed to automatically predict ICD codes from clinical notes. However, most previous works ignore the decisive information contained in structured medical data in EHRs, which is hard to be captured from the noisy clinical notes. In this paper, we propose a Tree-enhanced Multimodal Attention Network (TreeMAN) to fuse tabular features and textual features into multimodal representations by enhancing the text representations with tree-based features via the attention mechanism. Tree-based features are constructed according to decision trees learned from structured multimodal medical data, which capture the decisive information about ICD coding. We can apply the same multi-label classifier from previous text models to the multimodal representations to predict ICD codes. Experiments on two MIMIC datasets show that our method outperforms prior state-of-the-art ICD coding approaches. The code is available at https://github.com/liu-zichen/TreeMAN.

2021

TEMP: Taxonomy Expansion with Dynamic Margin Loss through Taxonomy-Paths
Zichen Liu | Hongyuan Xu | Yanlong Wen | Ning Jiang | HaiYing Wu | Xiaojie Yuan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

As an essential form of knowledge representation, taxonomies are widely used in various downstream natural language processing tasks. However, with the continuously rising of new concepts, many existing taxonomies are unable to maintain coverage by manual expansion. In this paper, we propose TEMP, a self-supervised taxonomy expansion method, which predicts the position of new concepts by ranking the generated taxonomy-paths. For the first time, TEMP employs pre-trained contextual encoders in taxonomy construction and hypernym detection problems. Experiments prove that pre-trained contextual embeddings are able to capture hypernym-hyponym relations. To learn more detailed differences between taxonomy-paths, we train the model with dynamic margin loss by a novel dynamic margin function. Extensive evaluations exhibit that TEMP outperforms prior state-of-the-art taxonomy expansion approaches by 14.3% in accuracy and 15.8% in mean reciprocal rank on three public benchmarks.

An End-to-End Progressive Multi-Task Learning Framework for Medical Named Entity Recognition and Normalization
Baohang Zhou | Xiangrui Cai | Ying Zhang | Xiaojie Yuan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Medical named entity recognition (NER) and normalization (NEN) are fundamental for constructing knowledge graphs and building QA systems. Existing implementations for medical NER and NEN are suffered from the error propagation between the two tasks. The mispredicted mentions from NER will directly influence the results of NEN. Therefore, the NER module is the bottleneck of the whole system. Besides, the learnable features for both tasks are beneficial to improving the model performance. To avoid the disadvantages of existing models and exploit the generalized representation across the two tasks, we design an end-to-end progressive multi-task learning model for jointly modeling medical NER and NEN in an effective way. There are three level tasks with progressive difficulty in the framework. The progressive tasks can reduce the error propagation with the incremental task settings which implies the lower level tasks gain the supervised signals other than errors from the higher level tasks to improve their performances. Besides, the context features are exploited to enrich the semantic information of entity mentions extracted by NER. The performance of NEN profits from the enhanced entity mention features. The standard entities from knowledge bases are introduced into the NER module for extracting corresponding entity mentions correctly. The empirical results on two publicly available medical literature datasets demonstrate the superiority of our method over nine typical methods.

Incorporating Circumstances into Narrative Event Prediction
Shichao Wang | Xiangrui Cai | HongBin Wang | Xiaojie Yuan
Findings of the Association for Computational Linguistics: EMNLP 2021

The narrative event prediction aims to predict what happens after a sequence of events, which is essential to modeling sophisticated real-world events. Existing studies focus on mining the inter-events relationships while ignoring how the events happened, which we called circumstances. With our observation, the event circumstances indicate what will happen next. To incorporate event circumstances into the narrative event prediction, we propose the CircEvent, which adopts the two multi-head attention to retrieve circumstances at the local and global levels. We also introduce a regularization of attention weights to leverage the alignment between events and local circumstances. The experimental results demonstrate our CircEvent outperforms existing baselines by 12.2%. The further analysis demonstrates the effectiveness of our multi-head attention modules and regularization.

2010

Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches
Shuming Shi | Huibin Zhang | Xiaojie Yuan | Ji-Rong Wen
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

Co-authors

Hongbin Wang (王洪彬) 2

Wensheng Zhang 2

Zhengkun Zhang 1

Venues