Hanxian Huang

2025

pdf bib abs
You Only Use Reactive Attention Slice When Retrieving From Long Context
Yun Joon Soh | Hanxian Huang | Yuandong Tian | Jishen Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025

Retrieval-Augmented Generation is a powerful method for enhancing language models (LMs), but existing retrieval techniques are limited.Embedding-based methods are often inaccurate due to their reliance on lexical similarity, while neural retrievers are computationally expensive to train.To overcome these issues, we introduce You Only Use Reactive Attention slice (YOURA), a training-free and fine-tuning-free attention-based retrieval technique. When retrieving, YOURA uses a novel reaction score heuristic, which quantifies how an LM’s self-attention “reacts” to a user query. We also propose a sentence extraction algorithm to efficiently preprocess the context.Evaluations on three open-source LMs using the LongBench and BABILong datasets show YOURA’s effectiveness. Our framework improves QA task accuracy by up to 15% and inference throughput by up to 31% compared to embedding-based retrieval.

2024

Knowledge distillation, the technique of transferring knowledge from large, complex models to smaller ones, marks a pivotal step towards efficient AI deployment. Distilling Step-by-Step (DSS), a novel method utilizing chain-of-thought (CoT) distillation, has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts. In DSS, the distilled model acquires the ability to generate rationales and predict labels concurrently through a multi-task learning framework. However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction. To this end, we investigate the mutual relationship of the two tasks from Information Bottleneck perspective and formulate it as maximizing the mutual information of the representation features of the two tasks. We propose a variational approach to solve this optimization problem using a learning-based method. Our experimental results across four datasets demonstrate that our method outperforms the state-of-the-art DSS. Our findings offer insightful guidance for future research on language model distillation as well as applications involving CoT. Codes are available at https://github.com/xinchen9/cot_distillation_ACL2024.

Co-authors

Yuandong Tian 1

Yi Wang 1

Venues

findings2

Fix author