Bo Xu (徐波, 徐博) - ACL Anthology

Bo Xu

Also published as: 波徐, 博徐

2025

Multimodal named entity recognition (MNER) extends traditional named entity recognition (NER) by integrating visual and textual information. However, current methods still face significant challenges due to the text-image mismatch problem. Recent advancements in text-to-image synthesis provide promising solutions, as synthesized images can introduce additional visual context to enhance MNER model performance. To fully leverage the benefits of both original and synthesized images, we propose an adaptive mixup image augmentation method. This method generates augmented images by determining the mixing ratio based on the matching score between the text and image, utilizing a triplet loss-based Gaussian Mixture Model (TL-GMM). Our approach is highly adaptable and can be seamlessly integrated into existing MNER models. Extensive experiments demonstrate consistent performance improvements, and detailed ablation studies and case studies confirm the effectiveness of our method.

With the prevalence of Large Language Models (LLMs), recent studies have shifted paradigms and leveraged LLMs to tackle the challenging task of Text-to-SQL. Because of the complexity of real world databases, previous works adopt the retrieve-then-generate framework to retrieve relevant database schema and then to generate the SQL query. However, efficient embedding-based retriever suffers from lower retrieval accuracy, and more accurate LLM-based retriever is far more expensive to use, which hinders their applicability for broader applications. To overcome this issue, this paper proposes Gen-SQL, a novel generate-ground-regenerate framework, where we exploit prior knowledge from the LLM to enhance embedding-based retriever and reduce cost. Experiments on several datasets are conducted to demonstrate the effectiveness and scalability of our proposed method. We release our code and data at https://github.com/jieshi10/gensql.

pdf bib abs
HyperHatePrompt: A Hypergraph-based Prompting Fusion Model for Multimodal Hate Detection
Bo Xu | Erchen Yu | Jiahui Zhou | Hongfei Lin | Linlin Zong
Proceedings of the 31st International Conference on Computational Linguistics

Multimodal hate detection aims to identify hate content across multiple modalities for promoting a harmonious online environment. Despite promising progress, three critical challenges, the absence of implicit hateful cues, the cross-modal-induced hate, and the diversity of hate target groups, inherent in the multimodal hate detection task, have been overlooked. To address these challenges, we propose a hypergraph-based prompting fusion model. Our model first uses tailored prompts to infer implicit hateful cues. It then introduces hyperedges to capture cross-modal-induced hate and applies a diversity-oriented hyperedge expansion strategy to account for different hate target groups. Finally, hypergraph convolution fuses diverse hateful cues, enhancing the exploration of cross-modal hate and targeting specific groups. Experimental results on two benchmark datasets show that our model achieves state-of-the-art performance in multimodal hate detection.

Text-to-SQL is a technology that converts natural language questions into executable SQL queries, allowing users to query and manage relational databases more easily. In recent years, large language models have significantly advanced the development of text-to-SQL. However, existing methods often overlook validation of the generated results during the SQL generation process. Current error identification methods are mainly divided into self-correction approaches based on large models and feedback methods based on SQL execution, both of which have limitations. We categorize SQL errors into three main types: system errors, skeleton errors, and value errors, and propose a multi-grained error identification method. Experimental results demonstrate that this method can be integrated as a plugin into various methods, providing effective error identification and correction capabilities.

Conditional semantic textual similarity (C-STS) assesses the similarity between pairs of sentence representations under different conditions. The current method encounters the over-estimation issue of positive and negative samples. Specifically, the similarity within positive samples is excessively high, while that within negative samples is excessively low. In this paper, we focus on the C-STS task and develop a conditional contrastive learning framework that constructs positive and negative samples from two perspectives, achieving the following primary objectives: (1) adaptive selection of the optimization direction for positive and negative samples to solve the over-estimation problem, (2) fully balance of the effects of hard and false negative samples. We validate the proposed method with five models based on bi-encoder and tri-encoder architectures, the results show that our proposed method achieves state-of-the-art performance. The code is available at https://github.com/qinzeyang0919/CCL.

2024

Brain-inspired Spiking Neural Network (SNN) has demonstrated its effectiveness and efficiency in vision, natural language, and speech understanding tasks, indicating their capacity to “see”, “listen”, and “read”. In this paper, we design SpikeVoice, which performs high-quality Text-To-Speech (TTS) via SNN, to explore the potential of SNN to “speak”. A major obstacle to using SNN for such generative tasks lies in the demand for models to grasp long-term dependencies. The serial nature of spiking neurons, however, leads to the invisibility of information at future spiking time steps, limiting SNN models to capture sequence dependencies solely within the same time step. We term this phenomenon “partial-time dependency”. To address this issue, we introduce Spiking Temporal-Sequential Attention (STSA) in the SpikeVoice. To the best of our knowledge, SpikeVoice is the first TTS work in the SNN field. We perform experiments using four well-established datasets that cover both Chinese and English languages, encompassing scenarios with both single-speaker and multi-speaker configurations. The results demonstrate that SpikeVoice can achieve results comparable to Artificial Neural Networks (ANN) with only 10.5% energy consumption of ANN. Both our demo and code are available as supplementary material.

Short video fake news detection is crucial for combating the spread of misinformation. Current detection methods tend to aggregate features from individual modalities into multimodal features, overlooking the implicit opinions and the evolving nature of opinions across modalities. In this paper, we mine implicit opinions within short video news and promote the evolution of both explicit and implicit opinions across all modalities. Specifically, we design a prompt template to mine implicit opinions regarding the credibility of news from the textual component of videos. Additionally, we employ a diffusion model that encourages the interplay among diverse modal opinions, including those extracted through our implicit opinion prompts. Experimental results on a publicly available dataset for short video fake news detection demonstrate the superiority of our model over state-of-the-art methods.

pdf bib abs
Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks
Xinyue Liu | Yunlong Gao | Linlin Zong | Bo Xu
Findings of the Association for Computational Linguistics: EMNLP 2024

Meta-learning has emerged as a prominent technology for few-shot text classification and has achieved promising performance. However, existing methods often encounter difficulties in drawing accurate class prototypes from support set samples, primarily due to probable large intra-class differences and small inter-class differences within the task. Recent approaches attempt to incorporate external knowledge or pre-trained language models to augment data, but this requires additional resources and thus does not suit many few-shot scenarios. In this paper, we propose a novel solution to address this issue by adequately leveraging the information within the task itself. Specifically, we utilize label information to construct a task-adaptive metric space, thereby adaptively reducing the intra-class differences and magnifying the inter-class differences. We further employ the optimal transport technique to estimate class prototypes with query set samples together, mitigating the problem of inaccurate and ambiguous support set samples caused by large intra-class differences. We conduct extensive experiments on eight benchmark datasets, and our approach shows obvious advantages over state-of-the-art models across all the tasks on all the datasets. For reproducibility, all the datasets and codes are available at https://github.com/YvoGao/LAQDA.

pdf bib abs
Breaking the Boundaries: A Unified Framework for Chinese Named Entity Recognition Across Text and Speech
Jinzhong Ning | Yuanyuan Sun | Bo Xu | Zhihao Yang | Ling Luo | Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2024

In recent years, with the vast and rapidly increasing amounts of spoken and textual data, Named Entity Recognition (NER) tasks have evolved into three distinct categories, i.e., text-based NER (TNER), Speech NER (SNER) and Multimodal NER (MNER). However, existing approaches typically require designing separate models for each task, overlooking the potential connections between tasks and limiting the versatility of NER methods. To mitigate these limitations, we introduce a new task named Integrated Multimodal NER (IMNER) to break the boundaries between different modal NER tasks, enabling a unified implementation of them. To achieve this, we first design a unified data format for inputs from different modalities. Then, leveraging the pre-trained MMSpeech model as the backbone, we propose an **I**ntegrated **M**ultimod**a**l **Ge**neration Framework (**IMAGE**), formulating the Chinese IMNER task as an entity-aware text generation task. Experimental results demonstrate the feasibility of our proposed IMAGE framework in the IMNER task. Our work in integrated multimodal learning in advancing the performance of NER may set up a new direction for future research in the field. Our source code is available at https://github.com/NingJinzhong/IMAGE4IMNER.

Disclaimer: Samples in this paper may be harmful and cause discomfort! Patronizing and condescending language (PCL) is a form of speech directed at vulnerable groups. As an essential branch of toxic language, this type of language exacerbates conflicts and confrontations among Internet communities and detrimentally impacts disadvantaged groups. Traditional pre-trained language models (PLMs) perform poorly in detecting PCL due to its implicit toxicity traits like hypocrisy and false sympathy. With the rise of large language models (LLMs), we can harness their rich emotional semantics to establish a paradigm for exploring implicit toxicity. In this paper, we introduce PclGPT, a comprehensive LLM benchmark designed specifically for PCL. We collect, annotate, and integrate the Pcl-PT/SFT dataset, and then develop a bilingual PclGPT-EN/CN model group through a comprehensive pre-training and supervised fine-tuning staircase process to facilitate implicit toxic detection. Group detection results and fine-grained detection from PclGPT and other models reveal significant variations in the degree of bias in PCL towards different vulnerable groups, necessitating increased societal attention to protect them.

pdf bib abs
Adaptive Reinforcement Tuning Language Models as Hard Data Generators for Sentence Representation
Bo Xu | Yifei Wu | Shouang Wei | Ming Du | Hongya Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Sentence representation learning is a fundamental task in NLP. Existing methods use contrastive learning (CL) to learn effective sentence representations, which benefit from high-quality contrastive data but require extensive human annotation. Large language models (LLMs) like ChatGPT and GPT4 can automatically generate such data. However, this alternative strategy also encounters challenges: 1) obtaining high-quality generated data from small-parameter LLMs is difficult, and 2) inefficient utilization of the generated data. To address these challenges, we propose a novel adaptive reinforcement tuning (ART) framework. Specifically, to address the first challenge, we introduce a reinforcement learning approach for fine-tuning small-parameter LLMs, enabling the generation of high-quality hard contrastive data without human feedback. To address the second challenge, we propose an adaptive iterative framework to guide the small-parameter LLMs to generate progressively harder samples through multiple iterations, thereby maximizing the utility of generated data. Experiments conducted on seven semantic text similarity tasks demonstrate that the sentence representation models trained using the synthetic data generated by our proposed method achieve state-of-the-art performance. Our code is available at https://github.com/WuNein/AdaptCL.

pdf bib abs
Beyond Linguistic Cues: Fine-grained Conversational Emotion Recognition via Belief-Desire Modelling
Bo Xu | Longjiao Li | Wei Luo | Mehdi Naseriparsa | Zhehuan Zhao | Hongfei Lin | Feng Xia
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Emotion recognition in conversation (ERC) is essential for dialogue systems to identify the emotions expressed by speakers. Although previous studies have made significant progress, accurate recognition and interpretation of similar fine-grained emotion properly accounting for individual variability remains a challenge. One particular under-explored area is the role of individual beliefs and desires in modelling emotion. Inspired by the Belief-Desire Theory of Emotion, we propose a novel method for conversational emotion recognition that incorporates both belief and desire to accurately identify emotions. We extract emotion-eliciting events from utterances and construct graphs that represent beliefs and desires in conversations. By applying message passing between nodes, our graph effectively models the utterance context, speaker’s global state, and the interaction between emotional beliefs, desires, and utterances. We evaluate our model’s performance by conducting extensive experiments on four popular ERC datasets and comparing it with multiple state-of-the-art models. The experimental results demonstrate the superiority of our proposed model and validate the effectiveness of each module in the model.

pdf bib abs
MNER-MI: A Multi-image Dataset for Multimodal Named Entity Recognition in Social Media
Shizhou Huang | Bo Xu | Changqun Li | Jiabo Ye | Xin Lin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Recently, multimodal named entity recognition (MNER) has emerged as a vital research area within named entity recognition. However, current MNER datasets and methods are predominantly based on text and a single accompanying image, leaving a significant research gap in MNER scenarios involving multiple images. To address the critical research gap and enhance the scope of MNER for real-world applications, we propose a novel human-annotated MNER dataset with multiple images called MNER-MI. Additionally, we construct a dataset named MNER-MI-Plus, derived from MNER-MI, to ensure its generality and applicability. Based on these datasets, we establish a comprehensive set of strong and representative baselines and we further propose a simple temporal prompt model with multiple images to address the new challenges in multi-image scenarios. We have conducted extensive experiments to demonstrate that considering multiple images provides a significant improvement over a single image and can offer substantial benefits for MNER. Furthermore, our proposed method achieves state-of-the-art results on both MNER-MI and MNER-MI-Plus, demonstrating its effectiveness. The datasets and source code can be found at https://github.com/JinFish/MNER-MI.

pdf bib abs
RENN: A Rule Embedding Enhanced Neural Network Framework for Temporal Knowledge Graph Completion
Linlin Zong | Zhenrong Xie | Chi Ma | Xinyue Liu | Xianchao Zhang | Bo Xu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Temporal knowledge graph completion is a critical task within the knowledge graph domain. Existing approaches encompass deep neural network-based methods for temporal knowledge graph embedding and rule-based logical symbolic reasoning. However, the former may not adequately account for structural dependencies between relations.Conversely, the latter methods relies heavily on strict logical rule reasoning and lacks robustness in the face of fuzzy or noisy data. In response to these challenges, we present RENN, a groundbreaking framework that enhances temporal knowledge graph completion through rule embedding. RENN employs a three-step approach. First, it utilizes temporary random walk to extract temporal logic rules. Then, it pre-trains by learning embeddings for each logical rule and its associated relations, thereby enhancing the likelihood of existing quadruples and logical rules. Finally, it incorporates the embeddings of logical rules into the deep neural network. Our methodology has been validated through experiments conducted on various temporal knowledge graph models and datasets, consistently demonstrating its effectiveness and potential in improving temporal knowledge graph completion.

pdf bib abs
Take Its Essence, Discard Its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect
Junyu Lu | Bo Xu | Xiaokun Zhang | Kaiyuan Liu | Dongyu Zhang | Liang Yang | Hongfei Lin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Researchers have attempted to mitigate lexical bias in toxic language detection (TLD). However, existing methods fail to disentangle the “useful” and “misleading” impact of lexical bias on model decisions. Therefore, they do not effectively exploit the positive effects of the bias and lead to a degradation in the detection performance of the debiased model. In this paper, we propose a Counterfactual Causal Debiasing Framework (CCDF) to mitigate lexical bias in TLD. It preserves the “useful impact” of lexical bias and eliminates the “misleading impact”. Specifically, we first represent the total effect of the original sentence and biased tokens on decisions from a causal view. We then conduct counterfactual inference to exclude the direct causal effect of lexical bias from the total effect. Empirical evaluations demonstrate that the debiased TLD model incorporating CCDF achieves state-of-the-art performance in both accuracy and fairness compared to competitive baselines applied on several vanilla models. The generalization capability of our model outperforms current debiased models for out-of-distribution data.

pdf bib abs
Temporal Knowledge Graph Reasoning with Dynamic Hypergraph Embedding
Xinyue Liu | Jianan Zhang | Chi Ma | Wenxin Liang | Bo Xu | Linlin Zong
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Reasoning over the Temporal Knowledge Graph (TKG) that predicts facts in the future has received much attention. Most previous works attempt to model temporal dynamics with knowledge graphs and graph convolution networks. However, these methods lack the consideration of high-order interactions between objects in TKG, which is an important factor to predict future facts. To address this problem, we introduce dynamic hypergraph embedding for temporal knowledge graph reasoning. Specifically, we obtain high-order interactions by constructing hypergraphs based on temporal knowledge graphs at different timestamps. Besides, we integrate the differences caused by time into the hypergraph representation in order to fit TKG. Then, we adapt dynamic meta-embedding for temporal hypergraph representation that allows our model to choose the appropriate high-order interactions for downstream reasoning. Experimental results on public TKG datasets show that our method outperforms the baselines. Furthermore, the analysis part demonstrates that the proposed method brings good interpretation for the predicted results.

The development of social platforms has facilitated the proliferation of disinformation, with memes becoming one of the most popular types of propaganda for disseminating disinformation on the internet. Effectively detecting the persuasion techniques hidden within memes is helpful in understanding user-generated content and further promoting the detection of disinformation on the internet. This paper demonstrates the approach proposed by Team DUTIR938 in Subtask 2b of SemEval-2024 Task 4. We propose a dual-channel model based on semi-supervised learning and model ensemble. We utilize CLIP to extract image features, and employ various pretrained language models under task-adaptive pretraining for text feature extraction. To enhance the detection and generalization capabilities of the model, we implement sample data augmentation using semi-supervised pseudo-labeling methods, introduce adversarial training strategies, and design a two-stage global model ensemble strategy. Our proposed method surpasses the provided baseline method, with Macro/Micro F1 values of 0.80910/0.83667 in the English leaderboard. Our submission ranks 3rd/19 in terms of Macro F1 and 1st/19 in terms of Micro F1.

2023

pdf bib abs
Just Like a Human Would, Direct Access to Sarcasm Augmented with Potential Result and Reaction
Changrong Min | Ximing Li | Liang Yang | Zhilin Wang | Bo Xu | Hongfei Lin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sarcasm, as a form of irony conveying mockery and contempt, has been widespread in social media such as Twitter and Weibo, where the sarcastic text is commonly characterized as an incongruity between the surface positive and negative situation. Naturally, it has an urgent demand to automatically identify sarcasm from social media, so as to illustrate people’s real views toward specific targets. In this paper, we develop a novel sarcasm detection method, namely Sarcasm Detector with Augmentation of Potential Result and Reaction (SD-APRR). Inspired by the direct access view, we treat each sarcastic text as an incomplete version without latent content associated with implied negative situations, including the result and human reaction caused by its observable content. To fill the latent content, we estimate the potential result and human reaction for each given training sample by [xEffect] and [xReact] relations inferred by the pre-trained commonsense reasoning tool COMET, and integrate the sample with them as an augmented one. We can then employ those augmented samples to train the sarcasm detector, whose encoder is a graph neural network with a denoising module. We conduct extensive empirical experiments to evaluate the effectiveness of SD-APRR. The results demonstrate that SD-APRR can outperform strong baselines on benchmark datasets.

pdf bib abs
Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks
Junyu Lu | Bo Xu | Xiaokun Zhang | Changrong Min | Liang Yang | Hongfei Lin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The widespread dissemination of toxic online posts is increasingly damaging to society. However, research on detecting toxic language in Chinese has lagged significantly due to limited datasets. Existing datasets suffer from a lack of fine-grained annotations, such as the toxic type and expressions with indirect toxicity. These fine-grained annotations are crucial factors for accurately detecting the toxicity of posts involved with lexical knowledge, which has been a challenge for researchers. To tackle this problem, we facilitate the fine-grained detection of Chinese toxic language by building a new dataset with benchmark results. First, we devised Monitor Toxic Frame, a hierarchical taxonomy to analyze the toxic type and expressions. Then, we built a fine-grained dataset ToxiCN, including both direct and indirect toxic samples. ToxiCN is based on an insulting vocabulary containing implicit profanity. We further propose a benchmark model, Toxic Knowledge Enhancement (TKE), by incorporating lexical features to detect toxic language. We demonstrate the usability of ToxiCN and the effectiveness of TKE based on a systematic quantitative and qualitative analysis.

2022

pdf bib abs
Different Data, Different Modalities! Reinforced Data Splitting for Effective Multimodal Information Extraction from Social Media Posts
Bo Xu | Shizhou Huang | Ming Du | Hongya Wang | Hui Song | Chaofeng Sha | Yanghua Xiao
Proceedings of the 29th International Conference on Computational Linguistics

Recently, multimodal information extraction from social media posts has gained increasing attention in the natural language processing community. Despite their success, current approaches overestimate the significance of images. In this paper, we argue that different social media posts should consider different modalities for multimodal information extraction. Multimodal models cannot always outperform unimodal models. Some posts are more suitable for the multimodal model, while others are more suitable for the unimodal model. Therefore, we propose a general data splitting strategy to divide the social media posts into two sets so that these two sets can achieve better performance under the information extraction models of the corresponding modalities. Specifically, for an information extraction task, we first propose a data discriminator that divides social media posts into a multimodal and a unimodal set. Then we feed these sets into the corresponding models. Finally, we combine the results of these two models to obtain the final extraction results. Due to the lack of explicit knowledge, we use reinforcement learning to train the data discriminator. Experiments on two different multimodal information extraction tasks demonstrate the effectiveness of our method. The source code of this paper can be found in https://github.com/xubodhu/RDS.

Intelligent medical services have attracted great research interests for providing automated medical consultation. However, the lack of corpora becomes a main obstacle to related research, particularly data from real scenarios. In this paper, we construct RealMedDial, a Chinese medical dialogue dataset based on real medical consultation. RealMedDial contains 2,637 medical dialogues and 24,255 utterances obtained from Chinese short-video clips of real medical consultations. We collected and annotated a wide range of meta-data with respect to medical dialogue including doctor profiles, hospital departments, diseases and symptoms for fine-grained analysis on language usage pattern and clinical diagnosis. We evaluate the performance of medical response generation, department routing and doctor recommendation on RealMedDial. Results show that RealMedDial are applicable to a wide range of NLP tasks with respect to medical dialogue.

pdf bib abs
GUTS at SemEval-2022 Task 4: Adversarial Training and Balancing Methods for Patronizing and Condescending Language Detection
Junyu Lu | Hao Zhang | Tongyue Zhang | Hongbo Wang | Haohao Zhu | Bo Xu | Hongfei Lin
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Patronizing and Condescending Language (PCL) towards vulnerable communities in general media has been shown to have potentially harmful effects. Due to its subtlety and the good intentions behind its use, the audience is not aware of the language’s toxicity. In this paper, we present our method for the SemEval-2022 Task4 titled “Patronizing and Condescending Language Detection”. In Subtask A, a binary classification task, we introduce adversarial training based on Fast Gradient Method (FGM) and employ pre-trained model in a unified architecture. For Subtask B, framed as a multi-label classification problem, we utilize various improved multi-label cross-entropy loss functions and analyze the performance of our method. In the final evaluation, our system achieved official rankings of 17/79 and 16/49 on Subtask A and Subtask B, respectively. In addition, we explore the relationship between PCL and emotional polarity and intensity it contains.

2021

pdf bib abs
结合标签转移关系的多任务笑点识别方法(Multi-task punchlines recognition method combined with label transfer relationship)
Tongyue Zhang (张童越) | Shaowu Zhang (张绍武) | Bo Xu (徐博) | Liang Yang (杨亮) | Hongfei Lin (林鸿飞)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

幽默在人类交流中扮演着重要角色,并大量存在于情景喜剧中。笑点(punchline)是情景喜剧实现幽默效果的形式之一,在情景喜剧笑点识别任务中,每条句子的标签代表该句是否为笑点,但是以往的笑点识别工作通常只通过建模上下文语义关系识别笑点,对标签的利用并不充分。为了充分利用标签序列中的信息,本文提出了一种新的识别方法,即结合条件随机场的单词级-句子级多任务学习模型,该模型在两方面进行了改进,首先将标签序列中相邻两个标签之间的转移关系看作幽默理论中不一致性的一种体现,并使用条件随机场学习这种转移关系,其次由于学习相邻标签之间的转移关系以及上下文语义关系均能够学习到铺垫和笑点之间的不一致性,两者之间存在相关性,为了使模型通过利用这种相关性提高笑点识别的效果,该模型引入了多任务学习方法,使用多任务学习方法同时学习每条句子的句义、组成每条句子的所有字符的词义,单词级别的标签转移关系以及句子级别的标签转移关系。本文在CCL2020“小牛杯”幽默计算—情景喜剧笑点识别评测任务的英文数据集上进行实验,结果表明,本文提出的方法比目前最好的方法提高了3.2%,在情景喜剧幽默笑点识别任务上取得了最好的效果,并通过消融实验证明了上述两方面改进的有效性。

软件源代码的理解则是软件协同开发与维护的核心,而源代码中占半数以上的标识符的理解则在软件理解中起到重要作用,传统软件工程主要研究通过命名规范限制标识符的命名过程以构造更易理解和交流的标识符。本文则在梳理分析常见编程语言命名规范的基础上,提出一种全新的标识符可理解性评价标准。具体而言,本文首先总结梳理了常见主流编程语言中的命名规范并类比自然语言语素概念本文提出基于软件语素的标识符构成过程,即标识符的构成可被视为软件语素的生成、排列和连接过程。在此基础上,本文提出一种结合自然语料库的软件标识符规范性评价方法,用来衡量软件标识符是否易于理解。最后,本文通过源代码理解数据集和乇乩乴乨乵乢平台中开源项目对规范性指标进行了验证性实验,结果表明本文提出的规范性分数能够很好衡量软件项目的可理解性。

pdf bib abs
Locality Preserving Sentence Encoding
Changrong Min | Yonghe Chu | Liang Yang | Bo Xu | Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2021

Although researches on word embeddings have made great progress in recent years, many tasks in natural language processing are on the sentence level. Thus, it is essential to learn sentence embeddings. Recently, Sentence BERT (SBERT) is proposed to learn embeddings on the sentence level, and it uses the inner product (or, cosine similarity) to compute semantic similarity between sentences. However, this measurement cannot well describe the semantic structures among sentences. The reason is that sentences may lie on a manifold in the ambient space rather than distribute in an Euclidean space. Thus, cosine similarity cannot approximate distances on the manifold. To tackle the severe problem, we propose a novel sentence embedding method called Sentence BERT with Locality Preserving (SBERT-LP), which discovers the sentence submanifold from a high-dimensional space and yields a compact sentence representation subspace by locally preserving geometric structures of sentences. We compare the SBERT-LP with several existing sentence embedding approaches from three perspectives: sentence similarity, sentence classification and sentence clustering. Experimental results and case studies demonstrate that our method encodes sentences better in the sense of semantic structures.

pdf bib abs
Counterfactual Supporting Facts Extraction for Explainable Medical Record Based Diagnosis with Graph Network
Haoran Wu | Wei Chen | Shuang Xu | Bo Xu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Providing a reliable explanation for clinical diagnosis based on the Electronic Medical Record (EMR) is fundamental to the application of Artificial Intelligence in the medical field. Current methods mostly treat the EMR as a text sequence and provide explanations based on a precise medical knowledge base, which is disease-specific and difficult to obtain for experts in reality. Therefore, we propose a counterfactual multi-granularity graph supporting facts extraction (CMGE) method to extract supporting facts from irregular EMR itself without external knowledge bases in this paper. Specifically, we first structure the sequence of EMR into a hierarchical graph network and then obtain the causal relationship between multi-granularity features and diagnosis results through counterfactual intervention on the graph. Features having the strongest causal connection with the results provide interpretive support for the diagnosis. Experimental results on real Chinese EMR of the lymphedema demonstrate that our method can diagnose four types of EMR correctly, and can provide accurate supporting facts for the results. More importantly, the results on different diseases demonstrate the robustness of our approach, which represents the potential application in the medical field.

2020

pdf bib abs
基于BERT的端到端中文篇章事件抽取(A BERT-based End-to-End Model for Chinese Document-level Event Extraction)
Hongkuan Zhang (张洪宽) | Hui Song (宋晖) | Shuyi Wang (王舒怡) | Bo Xu (徐波)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

篇章级事件抽取研究从整篇文档中检测事件,识别出事件包含的元素并赋予每个元素特定的角色。本文针对限定领域的中文文档提出了基于BERT的端到端模型,在模型的元素和角色识别中依次引入前序层输出的事件类型以及实体嵌入表示,增强文本的事件、元素和角色关联表示,提高篇章中各事件所属元素的识别精度。在此基础上利用标题信息和事件五元组的嵌入式表示,实现主从事件的划分及元素融合。实验证明本文的方法与现有工作相比具有明显的提升。

pdf bib abs
Knowledge Aware Emotion Recognition in Textual Conversations via Multi-Task Incremental Transformer
Duzhen Zhang | Xiuyi Chen | Shuang Xu | Bo Xu
Proceedings of the 28th International Conference on Computational Linguistics

Emotion recognition in textual conversations (ERTC) plays an important role in a wide range of applications, such as opinion mining, recommender systems, and so on. ERTC, however, is a challenging task. For one thing, speakers often rely on the context and commonsense knowledge to express emotions; for another, most utterances contain neutral emotion in conversations, as a result, the confusion between a few non-neutral utterances and much more neutral ones restrains the emotion recognition performance. In this paper, we propose a novel Knowledge Aware Incremental Transformer with Multi-task Learning (KAITML) to address these challenges. Firstly, we devise a dual-level graph attention mechanism to leverage commonsense knowledge, which augments the semantic information of the utterance. Then we apply the Incremental Transformer to encode multi-turn contextual utterances. Moreover, we are the first to introduce multi-task learning to alleviate the aforementioned confusion and thus further improve the emotion recognition performance. Extensive experimental results show that our KAITML model outperforms the state-of-the-art models across five benchmark datasets.

Knowledge selection plays an important role in knowledge-grounded dialogue, which is a challenging task to generate more informative responses by leveraging external knowledge. Recently, latent variable models have been proposed to deal with the diversity of knowledge selection by using both prior and posterior distributions over knowledge and achieve promising performance. However, these models suffer from a huge gap between prior and posterior knowledge selection. Firstly, the prior selection module may not learn to select knowledge properly because of lacking the necessary posterior information. Secondly, latent variable models suffer from the exposure bias that dialogue generation is based on the knowledge selected from the posterior distribution at training but from the prior distribution at inference. Here, we deal with these issues on two aspects: (1) We enhance the prior selection module with the necessary posterior information obtained from the specially designed Posterior Information Prediction Module (PIPM); (2) We propose a Knowledge Distillation Based Training Strategy (KDBTS) to train the decoder with the knowledge selected from the prior distribution, removing the exposure bias of knowledge selection. Experimental results on two knowledge-grounded dialogue datasets show that both PIPM and KDBTS achieve performance improvement over the state-of-the-art latent variable model and their combination shows further improvement.

2019

pdf bib abs
The World in My Mind: Visual Dialog with Adversarial Multi-modal Feature Encoding
Yiqun Yao | Jiaming Xu | Bo Xu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Visual Dialog is a multi-modal task that requires a model to participate in a multi-turn human dialog grounded on an image, and generate correct, human-like responses. In this paper, we propose a novel Adversarial Multi-modal Feature Encoding (AMFE) framework for effective and robust auxiliary training of visual dialog systems. AMFE can force the language-encoding part of a model to generate hidden states in a distribution closely related to the distribution of real-world images, resulting in language features containing general knowledge from both modalities by nature, which can help generate both more correct and more general responses with reasonably low time cost. Experimental results show that AMFE can steadily bring performance gains to different models on different scales of data. Our method outperforms both the supervised learning baselines and other fine-tuning methods, achieving state-of-the-art results on most metrics of VisDial v0.5/v0.9 generative tasks.

pdf bib abs
A Working Memory Model for Task-oriented Dialog Response Generation
Xiuyi Chen | Jiaming Xu | Bo Xu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recently, to incorporate external Knowledge Base (KB) information, one form of world knowledge, several end-to-end task-oriented dialog systems have been proposed. These models, however, tend to confound the dialog history with KB tuples and simply store them into one memory. Inspired by the psychological studies on working memory, we propose a working memory model (WMM2Seq) for dialog response generation. Our WMM2Seq adopts a working memory to interact with two separated long-term memories, which are the episodic memory for memorizing dialog history and the semantic memory for storing KB tuples. The working memory consists of a central executive to attend to the aforementioned memories, and a short-term storage system to store the “activated” contents from the long-term memories. Furthermore, we introduce a context-sensitive perceptual process for the token representations of dialog history, and then feed them into the episodic memory. Extensive experiments on two task-oriented dialog datasets demonstrate that our WMM2Seq significantly outperforms the state-of-the-art results in several evaluation metrics.

2018

While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity. To tackle this problem, we propose a novel semi-supervised approach which can utilize large amounts of unlabelled data. In this work, a light-weight neural net is proposed to extract the hidden features based solely on self-attention without any Recurrent Neural Network (RNN) or Convolutional Neural Network (CNN). In addition, we use the unlabelled corpus to enhance the performance. Besides, the Generative Adversarial Network (GAN) training is applied to enforce the similar distribution between the labelled and unlabelled data. The experimental results show that our approach achieves significant improvements over strong baselines.

pdf bib abs
Cascaded Mutual Modulation for Visual Reasoning
Yiqun Yao | Jiaming Xu | Feng Wang | Bo Xu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Visual reasoning is a special visual question answering problem that is multi-step and compositional by nature, and also requires intensive text-vision interactions. We propose CMM: Cascaded Mutual Modulation as a novel end-to-end visual reasoning model. CMM includes a multi-step comprehension process for both question and image. In each step, we use a Feature-wise Linear Modulation (FiLM) technique to enable textual/visual pipeline to mutually control each other. Experiments show that CMM significantly outperforms most related models, and reach state-of-the-arts on two visual reasoning benchmarks: CLEVR and NLVR, collected from both synthetic and natural languages. Ablation studies confirm the effectiveness of CMM to comprehend natural language logics under the guidence of images. Our code is available at https://github.com/FlamingHorizon/CMM-VR.

Homographic puns have a long history in human writing, widely used in written and spoken literature, which usually occur in a certain syntactic or stylistic structure. How to recognize homographic puns is an important research. However, homographic pun recognition does not solve very well in existing work. In this work, we first use WordNet to understand and expand word embedding for settling the polysemy of homographic puns, and then propose a WordNet-Encoded Collocation-Attention network model (WECA) which combined with the context weights for recognizing the puns. Our experiments on the SemEval2017 Task7 and Pun of the Day demonstrate that the proposed model is able to distinguish between homographic pun and non-homographic pun texts. We show the effectiveness of the model to present the capability of choosing qualitatively informative words. The results show that our model achieves the state-of-the-art performance on homographic puns recognition.

pdf bib abs
Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets
Zhen Yang | Wei Chen | Feng Wang | Bo Xu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

This paper proposes an approach for applying GANs to NMT. We build a conditional sequence generative adversarial net which comprises of two adversarial sub models, a generator and a discriminator. The generator aims to generate sentences which are hard to be discriminated from human-translated sentences ( i.e., the golden target sentences); And the discriminator makes efforts to discriminate the machine-generated sentences from human-translated ones. The two sub models play a mini-max game and achieve the win-win situation when they reach a Nash Equilibrium. Additionally, the static sentence-level BLEU is utilized as the reinforced objective for the generator, which biases the generation towards high BLEU points. During training, both the dynamic discriminator and the static BLEU objective are employed to evaluate the generated sentences and feedback the evaluations to guide the learning of the generator. Experimental results show that the proposed model consistently outperforms the traditional RNNSearch and the newly emerged state-of-the-art Transformer on English-German and Chinese-English translation tasks.

pdf bib abs
Unsupervised Neural Machine Translation with Weight Sharing
Zhen Yang | Wei Chen | Feng Wang | Bo Xu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Unsupervised neural machine translation (NMT) is a recently proposed approach for machine translation which aims to train the model without using any labeled data. The models proposed for unsupervised NMT often use only one shared encoder to map the pairs of sentences from different languages to a shared-latent space, which is weak in keeping the unique and internal characteristics of each language, such as the style, terminology, and sentence structure. To address this issue, we introduce an extension by utilizing two independent encoders but sharing some partial weights which are responsible for extracting high-level representations of the input sentences. Besides, two different generative adversarial networks (GANs), namely the local GAN and global GAN, are proposed to enhance the cross-language translation. With this new approach, we achieve significant improvements on English-German, English-French and Chinese-to-English translation tasks.

pdf bib abs
Construction of a Chinese Corpus for the Analysis of the Emotionality of Metaphorical Expressions
Dongyu Zhang | Hongfei Lin | Liang Yang | Shaowu Zhang | Bo Xu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Metaphors are frequently used to convey emotions. However, there is little research on the construction of metaphor corpora annotated with emotion for the analysis of emotionality of metaphorical expressions. Furthermore, most studies focus on English, and few in other languages, particularly Sino-Tibetan languages such as Chinese, for emotion analysis from metaphorical texts, although there are likely to be many differences in emotional expressions of metaphorical usages across different languages. We therefore construct a significant new corpus on metaphor, with 5,605 manually annotated sentences in Chinese. We present an annotation scheme that contains annotations of linguistic metaphors, emotional categories (joy, anger, sadness, fear, love, disgust and surprise), and intensity. The annotation agreement analyses for multiple annotators are described. We also use the corpus to explore and analyze the emotionality of metaphors. To the best of our knowledge, this is the first relatively large metaphor corpus with an annotation of emotions in Chinese.

2017

pdf bib abs
Towards Compact and Fast Neural Machine Translation Using a Combined Method
Xiaowei Zhang | Wei Chen | Feng Wang | Shuang Xu | Bo Xu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Neural Machine Translation (NMT) lays intensive burden on computation and memory cost. It is a challenge to deploy NMT models on the devices with limited computation and memory budgets. This paper presents a four stage pipeline to compress model and speed up the decoding for NMT. Our method first introduces a compact architecture based on convolutional encoder and weight shared embeddings. Then weight pruning is applied to obtain a sparse model. Next, we propose a fast sequence interpolation approach which enables the greedy decoding to achieve performance on par with the beam search. Hence, the time-consuming beam search can be replaced by simple greedy decoding. Finally, vocabulary selection is used to reduce the computation of softmax layer. Our final model achieves 10 times speedup, 17 times parameters reduction, less than 35MB storage size and comparable performance compared to the baseline model.

pdf bib abs
Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation
Chunqi Wang | Bo Xu
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Character-based sequence labeling framework is flexible and efficient for Chinese word segmentation (CWS). Recently, many character-based neural models have been applied to CWS. While they obtain good performance, they have two obvious weaknesses. The first is that they heavily rely on manually designed bigram feature, i.e. they are not good at capturing n-gram features automatically. The second is that they make no use of full word information. For the first weakness, we propose a convolutional neural model, which is able to capture rich n-gram features without any feature engineering. For the second one, we propose an effective approach to integrate the proposed model with word embeddings. We evaluate the model on two benchmark datasets: PKU and MSR. Without any feature engineering, the model obtains competitive performance — 95.7% on PKU and 97.3% on MSR. Armed with word embeddings, the model achieves state-of-the-art performance on both datasets — 96.5% on PKU and 98.0% on MSR, without using any external labeled resource.

pdf bib abs
Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
Suncong Zheng | Feng Wang | Hongyun Bao | Yuexing Hao | Peng Zhou | Bo Xu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Joint extraction of entities and relations is an important task in information extraction. To tackle this problem, we firstly propose a novel tagging scheme that can convert the joint extraction task to a tagging problem.. Then, based on our tagging scheme, we study different end-to-end models to extract entities and their relations directly, without identifying entities and relations separately. We conduct experiments on a public dataset produced by distant supervision method and the experimental results show that the tagging based methods are better than most of the existing pipelined and joint learning methods. What’s more, the end-to-end model proposed in this paper, achieves the best results on the public dataset.

2016

Recently, end-to-end memory networks have shown promising results on Question Answering task, which encode the past facts into an explicit memory and perform reasoning ability by making multiple computational steps on the memory. However, memory networks conduct the reasoning on sentence-level memory to output coarse semantic vectors and do not further take any attention mechanism to focus on words, which may lead to the model lose some detail information, especially when the answers are rare or unknown words. In this paper, we propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the past facts into sentence-level memory and word-level memory respectively. Then, k-max pooling is exploited following reasoning module on the sentence-level memory to sample the k most relevant sentences to a question and feed these sentences into attention mechanism on the word-level memory to focus the words in the selected sentences. Finally, the prediction is jointly learned over the outputs of the sentence-level reasoning module and the word-level attention mechanism. The experimental results demonstrate that our approach successfully conducts answer selection on unknown words and achieves a better performance than memory networks.

pdf bib abs
A Character-Aware Encoder for Neural Machine Translation
Zhen Yang | Wei Chen | Feng Wang | Bo Xu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This article proposes a novel character-aware neural machine translation (NMT) model that views the input sequences as sequences of characters rather than words. On the use of row convolution (Amodei et al., 2015), the encoder of the proposed model composes word-level information from the input sequences of characters automatically. Since our model doesn’t rely on the boundaries between each word (as the whitespace boundaries in English), it is also applied to languages without explicit word segmentations (like Chinese). Experimental results on Chinese-English translation tasks show that the proposed character-aware NMT model can achieve comparable translation performance with the traditional word based NMT models. Despite the target side is still word based, the proposed model is able to generate much less unknown words.

pdf bib abs
Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling
Peng Zhou | Zhenyu Qi | Suncong Zheng | Jiaming Xu | Hongyun Bao | Bo Xu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Recurrent Neural Network (RNN) is one of the most popular architectures used in Natural Language Processsing (NLP) tasks because its recurrent structure is very suitable to process variable-length text. RNN can utilize distributed representations of words by first converting the tokens comprising each text into vectors, which form a matrix. And this matrix includes two dimensions: the time-step dimension and the feature vector dimension. Then most existing models usually utilize one-dimensional (1D) max pooling operation or attention-based operation only on the time-step dimension to obtain a fixed-length vector. However, the features on the feature vector dimension are not mutually independent, and simply applying 1D pooling operation over the time-step dimension independently may destroy the structure of the feature representation. On the other hand, applying two-dimensional (2D) pooling operation over the two dimensions may sample more meaningful features for sequence modeling tasks. To integrate the features on both dimensions of the matrix, this paper explores applying 2D max pooling operation to obtain a fixed-length representation of the text. This paper also utilizes 2D convolution to sample more meaningful information of the matrix. Experiments are conducted on six text classification tasks, including sentiment analysis, question classification, subjectivity classification and newsgroup classification. Compared with the state-of-the-art models, the proposed models achieve excellent performance on 4 out of 6 tasks. Specifically, one of the proposed models achieves highest accuracy on Stanford Sentiment Treebank binary classification and fine-grained classification tasks.

pdf bib abs
Combining Lexical and Semantic-based Features for Answer Sentence Selection
Jing Shi | Jiaming Xu | Yiqun Yao | Suncong Zheng | Bo Xu
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

Question answering is always an attractive and challenging task in natural language processing area. There are some open domain question answering systems, such as IBM Waston, which take the unstructured text data as input, in some ways of humanlike thinking process and a mode of artificial intelligence. At the conference on Natural Language Processing and Chinese Computing (NLPCC) 2016, China Computer Federation hosted a shared task evaluation about Open Domain Question Answering. We achieve the 2nd place at the document-based subtask. In this paper, we present our solution, which consists of feature engineering in lexical and semantic aspects and model training methods. As the result of the evaluation shows, our solution provides a valuable and brief model which could be used in modelling question answering or sentence semantic relevance. We hope our solution would contribute to this vast and significant task with some heuristic thinking.

2015

pdf bib
Semi-supervised Chinese Word Segmentation based on Bilingual Information
Wei Chen | Bo Xu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Semantic Clustering and Convolutional Neural Network for Short Text Categorization
Peng Wang | Jiaming Xu | Bo Xu | Chenglin Liu | Heng Zhang | Fangyuan Wang | Hongwei Hao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Dialogue Management based on Sentence Clustering
Wendong Ge | Bo Xu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Dialogue Management based on Multi-domain Corpus
Wendong Ge | Bo Xu
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
Shixiang Lu | Zhenbiao Chen | Bo Xu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

In this paper, we describe the CASIA statistical machine translation (SMT) system for the IWSLT2013 Evaluation Campaign. We participated in the Chinese-English and English-Chinese translation tasks. For both of these tasks, we used a hierarchical phrase-based (HPB) decoder and made it as our baseline translation system. A number of techniques were proposed to deal with these translation tasks, including parallel sentence extraction, pre-processing, translation model (TM) optimization, language model (LM) interpolation, turning, and post-processing. With these techniques, the translation results were significantly improved compared with that of the baseline system.

pdf bib
Phrase-based Parallel Fragments Extraction from Comparable Corpora
Xiaoyin Fu | Wei Wei | Shixiang Lu | Zhenbiao Chen | Bo Xu
Proceedings of the Sixth International Joint Conference on Natural Language Processing

MANOS (Multilingual Application Network for Olympic Services) project. aims to provide intelligent multilingual information services in 2008 Olympic Games. By narrowing down the general language technology, this paper gives an overview of our new work on Phrase-Based Statistical Machine Translation (PBT) under the framework of the MANOS. Starting with the construction of large scale Chinese-English corpus (sentence aligned) and introduction four methods to extract phrases, The promising results from PBT systems lead us to confidences for constructing a high-quality translation system and harmoniously integrate it into MANOS platform.

pdf bib
Chinese Named Entity Recognition with Multiple Features
Youzheng Wu | Jun Zhao | Bo Xu | Hao Yu
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Product Named Entity Recognition Based on Hierarchical Hidden Markov Model
Feifan Liu | Jun Zhao | Bibo Lv | Bo Xu | Hao Yu
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing