Xiaokang Zhang


2024

pdf bib
Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking
Xiaokang Zhang | Zijun Yao | Jing Zhang | Kaifeng Yun | Jifan Yu | Juanzi Li | Jie Tang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper proposes PiNose, which trains a probing model on offline self-consistency checking results, thereby circumventing the need for human-annotated data and achieving transferability across diverse data distributions. As the consistency check process is offline, PiNose reduces the computational burden of generating multiple responses by online consistency verification. Additionally, it examines various aspects of internal states prior to response decoding, contributing to more effective detection of factual inaccuracies. Experiment results on both factuality detection and question answering benchmarks show that PiNose achieves surpassing results than existing factuality detection methods.

2022

pdf bib
Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering
Jing Zhang | Xiaokang Zhang | Jifan Yu | Jian Tang | Jie Tang | Cuiping Li | Hong Chen
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent works on knowledge base question answering (KBQA) retrieve subgraphs for easier reasoning. The desired subgraph is crucial as a small one may exclude the answer but a large one might introduce more noises. However, the existing retrieval is either heuristic or interwoven with the reasoning, causing reasoning on the partial subgraphs, which increases the reasoning bias when the intermediate supervision is missing. This paper proposes a trainable subgraph retriever (SR) decoupled from the subsequent reasoning process, which enables a plug-and-play framework to enhance any subgraph-oriented KBQA model. Extensive experiments demonstrate SR achieves significantly better retrieval and QA performance than existing retrieval methods. Via weakly supervised pre-training as well as the end-to-end fine-tuning, SR achieves new state-of-the-art performance when combined with NSM (He et al., 2021), a subgraph-oriented reasoner, for embedding-based KBQA methods. Codes and datasets are available online (https://github.com/RUCKBReasoning/SubgraphRetrievalKBQA)

pdf bib
HOSMEL: A Hot-Swappable Modularized Entity Linking Toolkit for Chinese
Daniel Zhang-li | Jing Zhang | Jifan Yu | Xiaokang Zhang | Peng Zhang | Jie Tang | Juanzi Li
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We investigate the usage of entity linking (EL)in downstream tasks and present the first modularized EL toolkit for easy task adaptation. Different from the existing EL methods that dealwith all the features simultaneously, we modularize the whole model into separate parts witheach feature. This decoupled design enablesflexibly adding new features without retraining the whole model as well as flow visualization with better interpretability of the ELresult. We release the corresponding toolkit,HOSMEL, for Chinese, with three flexible usage modes, a live demo, and a demonstrationvideo. Experiments on two benchmarks forthe question answering task demonstrate thatHOSMEL achieves much less time and spaceconsumption as well as significantly better accuracy performance compared with existingSOTA EL methods. We hope the release ofHOSMEL will call for more attention to studyEL for downstream tasks in non-English languages.