Weihe Zhai


2024

pdf bib
Towards Faithful Knowledge Graph Explanation Through Deep Alignment in Commonsense Question Answering
Weihe Zhai | Arkaitz Zubiaga | Bingquan Liu | Chengjie Sun | Yalong Zhao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The fusion of language models (LMs) and knowledge graphs (KGs) is widely used in commonsense question answering, but generating faithful explanations remains challenging. Current methods often overlook path decoding faithfulness, leading to divergence between graph encoder outputs and model predictions. We identify confounding effects and LM-KG misalignment as key factors causing spurious explanations. To address this, we introduce the LM-KG Fidelity metric to assess KG representation reliability and propose the LM-KG Distribution-aware Alignment (LKDA) algorithm to improve explanation faithfulness. Without ground truth, we evaluate KG explanations using the proposed Fidelity-Sparsity Trade-off Curve. Experiments on CommonsenseQA and OpenBookQA show that LKDA significantly enhances explanation fidelity and model performance, highlighting the need to address distributional misalignment for reliable commonsense reasoning.

2022

pdf bib
HIT&QMUL at SemEval-2022 Task 9: Label-Enclosed Generative Question Answering (LEG-QA)
Weihe Zhai | Mingqiang Feng | Arkaitz Zubiaga | Bingquan Liu
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper presents the second place system for the R2VQ: competence-based multimodal question answering shared task. The purpose of this task is to involve semantic&cooking roles and text-images objects when querying how well a system understands the procedure of a recipe. This task is approached with text-to-text generative model based on transformer architecture. As a result, the model can well generalise to soft constrained and other competence-based question answering problem. We propose label enclosed input method which help the model achieve significant improvement from 65.34 (baseline) to 91.3. In addition to describing the submitted system, the impact of model architecture and label selection are investigated along with remarks regarding error analysis. Finally, future works are presented.