Zhiyong Feng


2024

pdf bib
Mitigating Language Bias of LMMs in Social Intelligence Understanding with Virtual Counterfactual Calibration
Peng Chen | Xiao-Yu Guo | Yuan-Fang Li | Xiaowang Zhang | Zhiyong Feng
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

pdf bib
Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach
Jingyuan Yang | Dapeng Chen | Yajing Sun | Rongjun Li | Zhiyong Feng | Wei Peng
Findings of the Association for Computational Linguistics: ACL 2024

A Large Language Model (LLM) tends to generate inconsistent and sometimes contradictory outputs when presented with a prompt that has equivalent semantics but is expressed differently from the original prompt. To achieve semantic consistency of an LLM, one of the key approaches is to finetune the model with prompt-output pairs with semantically equivalent meanings. Despite its effectiveness, a data-driven finetuning method incurs substantial computation costs in data preparation and model optimization. In this regime, an LLM is treated as a “black box”, restricting our ability to gain deeper insights into its internal mechanism. In this paper, we are motivated to enhance the semantic consistency of LLMs through a more interpretable method (i.e., model editing) to this end. We first identify the model components (i.e., attention heads) that have a key impact on the semantic consistency of an LLM. We subsequently inject biases into the output of these model components along the semantic-consistency activation direction. It is noteworthy that these modifications are cost-effective, without reliance on mass manipulations of the original model parameters. Through comprehensive experiments on the constructed NLU and open-source NLG datasets, our method demonstrates significant improvements in the semantic consistency and task performance of LLMs. Additionally, our method exhibits promising generalization capabilities by performing well on tasks beyond the primary tasks.

pdf bib
An Event-based Abductive Learning for Hard Time-sensitive Question Answering
Shaojuan Wu | Jitong Li | Xiaowang Zhang | Zhiyong Feng
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Time-Sensitive Question Answering (TSQA) is to answer questions qualified for a certain timestamp based on the given document. It is split into easy and hard modes depending on whether the document contain time qualifiers mentioned in the question. While existing models have performed well on easy mode, their performance is significant reduced for answering hard time-sensitive questions, whose time qualifiers are implicit in the document. An intuitive idea is to match temporal events in the given document by treating time-sensitive question as a temporal event of missing objects. However, not all temporal events extracted from the document have explicit time qualifiers. In this paper, we propose an Event-AL framework, in which a graph pruning model is designed to locate the timespan of implicit temporal events by capturing temporal relation between events. Moreover, we present an abductive reasoning module to determine proper objects while providing explanations. Besides, as the same relation may be scattered throughout the document in diverse expressions, a relation-based prompt is introduced to instructs LLMs in extracting candidate temporal events. We conduct extensive experiment and results show that Event-AL outperforms strong baselines for hard time-sensitive questions, with a 12.7% improvement in EM scores. In addition, it also exhibits great superiority for multi-answer and beyond hard time-sensitive questions.

2023

pdf bib
Causal Intervention for Mitigating Name Bias in Machine Reading Comprehension
Jiazheng Zhu | Shaojuan Wu | Xiaowang Zhang | Yuexian Hou | Zhiyong Feng
Findings of the Association for Computational Linguistics: ACL 2023

Machine Reading Comprehension (MRC) is to answer questions based on a given passage, which has made great achievements using pre-trained Language Models (LMs). We study the robustness of MRC models to names which is flexible and repeatability. MRC models based on LMs may overuse the name information to make predictions, which causes the representation of names to be non-interchangeable, called name bias. In this paper, we propose a novel Causal Interventional paradigm for MRC (CI4MRC) to mitigate name bias. Specifically, we uncover that the pre-trained knowledge concerning names is indeed a confounder by analyzing the causalities among the pre-trained knowledge, context representation and answers based on a Structural Causal Model (SCM). We develop effective CI4MRC algorithmic implementations to constrain the confounder based on the neuron-wise and token-wise adjustments. Experiments demonstrate that our proposed CI4MRC effectively mitigates the name bias and achieves competitive performance on the original SQuAD. Moreover, our method is general to various pre-trained LMs and performs robustly on the adversarial datasets.

pdf bib
Document-level Relationship Extraction by Bidirectional Constraints of Beta Rules
Yichun Liu | Zizhong Zhu | Xiaowang Zhang | Zhiyong Feng | Daoqi Chen | Yaxin Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Document-level Relation Extraction (DocRE) aims to extract relations among entity pairs in documents. Some works introduce logic constraints into DocRE, addressing the issues of opacity and weak logic in original DocRE models. However, they only focus on forward logic constraints and the rules mined in these works often suffer from pseudo rules with high standard-confidence but low support. In this paper, we proposes Bidirectional Constraints of Beta Rules(BCBR), a novel logic constraint framework. BCBR first introduces a new rule miner which model rules by beta contribtion. Then forward and reverse logic constraints are constructed based on beta rules. Finally, BCBR reconstruct rule consistency loss by bidirectional constraints to regulate the output of the DocRE model. Experiments show that BCBR outperforms original DocRE models in terms of relation extraction performance (~2.7 F1 score) and logical consistency(~3.1 logic score). Furthermore, BCBR consistently outperforms two other logic constraint frameworks.

2022

pdf bib
Learning Disentangled Semantic Representations for Zero-Shot Cross-Lingual Transfer in Multilingual Machine Reading Comprehension
Linjuan Wu | Shaojuan Wu | Xiaowang Zhang | Deyi Xiong | Shizhan Chen | Zhiqiang Zhuang | Zhiyong Feng
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multilingual pre-trained models are able to zero-shot transfer knowledge from rich-resource to low-resource languages in machine reading comprehension (MRC). However, inherent linguistic discrepancies in different languages could make answer spans predicted by zero-shot transfer violate syntactic constraints of the target language. In this paper, we propose a novel multilingual MRC framework equipped with a Siamese Semantic Disentanglement Model (S2DM) to disassociate semantics from syntax in representations learned by multilingual pre-trained models. To explicitly transfer only semantic knowledge to the target language, we propose two groups of losses tailored for semantic and syntactic encoding and disentanglement. Experimental results on three multilingual MRC datasets (i.e., XQuAD, MLQA, and TyDi QA) demonstrate the effectiveness of our proposed approach over models based on mBERT and XLM-100.

2021

pdf bib
Re-embedding Difficult Samples via Mutual Information Constrained Semantically Oversampling for Imbalanced Text Classification
Jiachen Tian | Shizhan Chen | Xiaowang Zhang | Zhiyong Feng | Deyi Xiong | Shaojuan Wu | Chunliu Dou
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Difficult samples of the minority class in imbalanced text classification are usually hard to be classified as they are embedded into an overlapping semantic region with the majority class. In this paper, we propose a Mutual Information constrained Semantically Oversampling framework (MISO) that can generate anchor instances to help the backbone network determine the re-embedding position of a non-overlapping representation for each difficult sample. MISO consists of (1) a semantic fusion module that learns entangled semantics among difficult and majority samples with an adaptive multi-head attention mechanism, (2) a mutual information loss that forces our model to learn new representations of entangled semantics in the non-overlapping region of the minority class, and (3) a coupled adversarial encoder-decoder that fine-tunes disentangled semantic representations to remain their correlations with the minority class, and then using these disentangled semantic representations to generate anchor instances for each difficult sample. Experiments on a variety of imbalanced text classification tasks demonstrate that anchor instances help classifiers achieve significant improvements over strong baselines.