Longyun Wu
2025
LongAttn: Selecting Long-context Training Data via Token-level Attention
Longyun Wu | Dawei Zhu | Guangxiang Zhao | Zhuocheng Yu | Junfeng Ran | Xiangyu Wong | Lin Sun | Sujian Li
Findings of the Association for Computational Linguistics: ACL 2025
Longyun Wu | Dawei Zhu | Guangxiang Zhao | Zhuocheng Yu | Junfeng Ran | Xiangyu Wong | Lin Sun | Sujian Li
Findings of the Association for Computational Linguistics: ACL 2025
With the development of large language models (LLMs), there has been an increasing need for significant advancements in handling long contexts. To enhance long-context capabilities, constructing high-quality training data with **long-range dependencies** is crucial. Existing methods to select long-context data often rely on sentence-level analysis,which can be greatly optimized in both performance and efficiency. In this paper, we propose a novel token-level framework, **LongAttn**, which leverages the self-attention mechanism of LLMs to measure the **long-range dependencies** for the data. By calculating token-level dependency strength and distribution uniformity of token scores, LongAttn effectively quantifies **long-range dependencies**, enabling more accurate and efficient data selection. We filter **LongABC-32K** from open-source long-context datasets (ArXiv, Book, and Code). Through our comprehensive experiments, LongAttn has demonstrated its excellent **effectiveness**, **scalability**, and **efficiency**. We will release our code and the high-quality long-context dataset **LongABC-32K** in the future.
BiSaGA: A Novel Bidirectional Sparse Graph Attention Adapter for Evidence-Based Fact-Checking
Junfeng Ran | Weiyao Luo | Zailong Tian | Guangxiang Zhao | Dawei Zhu | Longyun Wu | Hailiang Huang | Sujian Li
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Junfeng Ran | Weiyao Luo | Zailong Tian | Guangxiang Zhao | Dawei Zhu | Longyun Wu | Hailiang Huang | Sujian Li
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"Evidence-based fact-checking aims to verify or debunk claims using evidence and has greatly benefited from advancements in Large Language Models (LLMs). This task relies on clarify-ing and discriminating relations between entities. However, autoregressive LLMs struggle with understanding relations presented in different orders or narratives, as their unidirectional na-ture hampers effective performance. To address this challenge, we propose a novel method that leverages bidirectional attention as an external adapter to facilitate two-way information aggregation. Additionally, we employ hierarchical sparse graphs to merge local and global information and introduce an efficient feature-compression technique to minimize the number of adapter parameters. Experimental results on both English and Chinese datasets demonstrate the significant improvements achieved by our approach, showcasing state-of-the-art performance in the evidence-based fact-checking task."