Yuquan Le


2025

pdf bib
TermDiffuSum: A Term-guided Diffusion Model for Extractive Summarization of Legal Documents
Xiangyun Dong | Wei Li | Yuquan Le | Zhangyue Jiang | Junxi Zhong | Zhong Wang
Proceedings of the 31st International Conference on Computational Linguistics

Extractive summarization for legal documents aims to automatically extract key sentences from legal texts to form concise summaries. Recent studies have explored diffusion models for extractive summarization task, showcasing their remarkable capabilities. Despite these advancements, these models often fall short in effectively capturing and leveraging the specialized legal terminology crucial for accurate legal summarization. To address the limitation, this paper presents a novel term-guided diffusion model for extractive summarization of legal documents, named TermDiffuSum. It incorporates legal terminology into the diffusion model via a well-designed multifactor fusion noise weighting schedule, which allocates higher attention weight to sentences containing a higher concentration of legal terms during the diffusion process. Additionally, TermDiffuSum utilizes a re-ranking loss function to refine the model’s selection of more relevant summaries by leveraging the relationship between the candidate summaries generated by the diffusion process and the reference summaries. Experimental results on a self-constructed legal summarization dataset reveal that TermDiffuSum outperforms existing diffusion-based summarization models, achieving improvements of 3.10 in ROUGE-1, 2.84 in ROUGE-2, and 2.89 in ROUGE-L. To further validate the generalizability of TermDiffuSum, we conduct experiments on three public datasets from news and social media domains, with results affirming the scalability of our approach.

2020

pdf bib
BCTH: A Novel Text Hashing Approach via Bayesian Clustering
Ying Wenjie | Yuquan Le | Hantao Xiong
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Similarity search is to find the most similar items for a certain target item. The ability of similarity search at large scale plays a significant role in many information retrieval applications, and thus has received much attention. Text hashing is a promising strategy, which utilizes binary encoding to represent documents, obtaining attractive performance. This paper makes the first attempt to utilize Bayesian Clustering for Text Hashing, dubbed as BCTH. Specifically, BCTH is able to map documents to binary codes by utilizing multiple Bayesian Clusterings in parallel, where each Bayesian Clustering is responsible for one bit. Our approach employs the bit-balanced constraint to maximize the amount of information in each bit. Meanwhile, the bit-uncorrected constraint is adopted to keep the independence among all bits. The time complexity of BCTH is linear, where the hash codes and hash function are jointly learned. The experimental results, based on four widely-used datasets, demonstrate that BCTH is competitive, compared with currently competitive baselines in the perspective of both precision and training speed.