Jing Zhu
2024
Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation
Yuhang Zhou
|
Jing Zhu
|
Paiheng Xu
|
Xiaoyu Liu
|
Xiyao Wang
|
Danai Koutra
|
Wei Ai
|
Furong Huang
Findings of the Association for Computational Linguistics: EMNLP 2024
Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of merely final outcomes, shows great potential in enhancing students’ reasoning capabilities. However, current methods struggle with sequence-level KD under long-tailed data distributions, adversely affecting generalization on sparsely represented domains. We introduce the Multi-Stage Balanced Distillation (BalDistill) framework, which iteratively balances training data within a fixed computational budget. By dynamically selecting representative head domain examples and synthesizing tail domain examples, BalDistill achieves state-of-the-art performance across diverse long-tailed datasets, enhancing both the efficiency and efficacy of the distilled models.
2021
NegatER: Unsupervised Discovery of Negatives in Commonsense Knowledge Bases
Tara Safavi
|
Jing Zhu
|
Danai Koutra
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Codifying commonsense knowledge in machines is a longstanding goal of artificial intelligence. Recently, much progress toward this goal has been made with automatic knowledge base (KB) construction techniques. However, such techniques focus primarily on the acquisition of positive (true) KB statements, even though negative (false) statements are often also important for discriminative reasoning over commonsense KBs. As a first step toward the latter, this paper proposes NegatER, a framework that ranks potential negatives in commonsense KBs using a contextual language model (LM). Importantly, as most KBs do not contain negatives, NegatER relies only on the positive knowledge in the LM and does not require ground-truth negative examples. Experiments demonstrate that, compared to multiple contrastive data augmentation approaches, NegatER yields negatives that are more grammatical, coherent, and informative—leading to statistically significant accuracy improvements in a challenging KB completion task and confirming that the positive knowledge in LMs can be “re-purposed” to generate negative knowledge.
2006
An HMM-Based Approach to Automatic Phrasing for Mandarin Text-to-Speech Synthesis
Jing Zhu
|
Jian-Hua Li
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions