Zihan Xu


pdf bib
Sinkhorn Distance Minimization for Knowledge Distillation
Xiao Cui | Yulei Qin | Yuting Gao | Enwei Zhang | Zihan Xu | Tong Wu | Ke Li | Xing Sun | Wengang Zhou | Houqiang Li
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Knowledge distillation (KD) has been widely adopted to compress large language models (LLMs). Existing KD methods investigate various divergence measures including the Kullback-Leibler (KL), reverse Kullback-Leibler (RKL), and Jensen-Shannon (JS) divergences. However, due to limitations inherent in their assumptions and definitions, these measures fail to deliver effective supervision when few distribution overlap exists between the teacher and the student. In this paper, we show that the aforementioned KL, RKL, and JS divergences respectively suffer from issues of mode-averaging, mode-collapsing, and mode-underestimation, which deteriorates logits-based KD for diverse NLP tasks. We propose the Sinkhorn Knowledge Distillation (SinKD) that exploits the Sinkhorn distance to ensure a nuanced and precise assessment of the disparity between teacher and student distributions. Besides, profit by properties of the Sinkhorn metric, we can get rid of sample-wise KD that restricts the perception of divergence in each teacher-student sample pair. Instead, we propose a batch-wise reformulation to capture geometric intricacies of distributions across samples in the high-dimensional space. Comprehensive evaluation on GLUE and SuperGLUE, in terms of comparability, validity, and generalizability, highlights our superiority over state-of-the-art methods on all kinds of LLMs with encoder-only, encoder-decoder, and decoder-only architectures.


pdf bib
ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser
Zhi Chen | Lu Chen | Yanbin Zhao | Ruisheng Cao | Zihan Xu | Su Zhu | Kai Yu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Given a database schema, Text-to-SQL aims to translate a natural language question into the corresponding SQL query. Under the setup of cross-domain, traditional semantic parsing models struggle to adapt to unseen database schemas. To improve the model generalization capability for rare and unseen schemas, we propose a new architecture, ShadowGNN, which processes schemas at abstract and semantic levels. By ignoring names of semantic items in databases, abstract schemas are exploited in a well-designed graph projection neural network to obtain delexicalized representation of question and schema. Based on the domain-independent representations, a relation-aware transformer is utilized to further extract logical linking between question and schema. Finally, a SQL decoder with context-free grammar is applied. On the challenging Text-to-SQL benchmark Spider, empirical results show that ShadowGNN outperforms state-of-the-art models. When the annotated data is extremely limited (only 10% training set), ShadowGNN gets over absolute 5% performance gain, which shows its powerful generalization ability. Our implementation will be open-sourced at https://github.com/WowCZ/shadowgnn


pdf bib
Answer-driven Deep Question Generation based on Reinforcement Learning
Liuyin Wang | Zihan Xu | Zibo Lin | Haitao Zheng | Ying Shen
Proceedings of the 28th International Conference on Computational Linguistics

Deep question generation (DQG) aims to generate complex questions through reasoning over multiple documents. The task is challenging and underexplored. Existing methods mainly focus on enhancing document representations, with little attention paid to the answer information, which may result in the generated question not matching the answer type and being answerirrelevant. In this paper, we propose an Answer-driven Deep Question Generation (ADDQG) model based on the encoder-decoder framework. The model makes better use of the target answer as a guidance to facilitate question generation. First, we propose an answer-aware initialization module with a gated connection layer which introduces both document and answer information to the decoder, thus helping to guide the choice of answer-focused question words. Then a semantic-rich fusion attention mechanism is designed to support the decoding process, which integrates the answer with the document representations to promote the proper handling of answer information during generation. Moreover, reinforcement learning is applied to integrate both syntactic and semantic metrics as the reward to enhance the training of the ADDQG. Extensive experiments on the HotpotQA dataset show that ADDQG outperforms state-of-the-art models in both automatic and human evaluations.