Trung Le

2025

pdf bib abs
MixLoRA-DSI: Dynamically Expandable Mixture-of-LoRA Experts for Rehearsal-Free Generative Retrieval over Dynamic Corpora
Tuan-Luc Huynh | Thuy-Trang Vu | Weiqing Wang | Trung Le | Dragan Gasevic | Yuan-Fang Li | Thanh-Toan Do
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Continually updating model-based indexes in generative retrieval with new documents remains challenging, as full retraining is computationally expensive and impractical under resource constraints. We propose MixLoRA-DSI, a novel framework that combines an expandable mixture of Low-Rank Adaptation experts with a layer-wise out-of-distribution (OOD)-driven expansion strategy. Instead of allocating new experts for each new corpus, our proposed expansion strategy enables sublinear parameter growth by selectively introducing new experts only when significant number of OOD documents are detected. Experiments on NQ320k and MS MARCO Passage demonstrate that MixLoRA-DSI outperforms full-model update baselines, with minimal parameter overhead and substantially lower training costs.

Knowledge distillation (KD) is crucial for compressing large text embedding models, but faces challenges when teacher and student models use different tokenizers (Cross-Tokenizer KD - CTKD). Vocabulary mismatches impede the transfer of relational knowledge encoded in deep representations, such as hidden states and attention matrices, which are vital for producing high-quality embeddings. Existing CTKD methods often focus on direct output alignment, neglecting this crucial structural information. We propose a novel framework tailored for CTKD embedding model distillation. We first map tokens one-to-one via Minimum Edit Distance (MinED). Then, we distill intra-model relational knowledge by aligning attention matrix patterns using Centered Kernel Alignment, focusing on the top-m most important tokens of the directly mapped tokens. Simultaneously, we align final hidden states via Optimal Transport with Importance-Scored Mass Assignment, which emphasizes semantically important token representations, based on importance scores derived from attention weights. We evaluate distillation from state-of-the-art embedding models (e.g., LLM2Vec, BGE) to a Bert-base-uncased model on embedding-reliant tasks such as text classification, sentence pair classification, and semantic textual similarity. Our proposed framework significantly outperforms existing CTKD baselines. By preserving attention structure and prioritizing key representations, our approach yields smaller, high-fidelity embedding models despite tokenizer differences.

Neural topic modeling has substantially improved topic quality and document topic distribution compared to traditional probabilistic methods. These models often incorporate multiple loss functions. However, the disparate magnitudes of these losses can make hyperparameter tuning for these loss functions challenging, potentially creating obstacles for simultaneous optimization. While gradient-based Multi-objective Optimization (MOO) algorithms offer a potential solution, they are typically applied to shared parameters in multi-task learning, hindering their broader adoption, particularly in Neural Topic Models (NTMs). Furthermore, our experiments reveal that naïve MOO applications on NTMs can yield suboptimal results, even underperforming compared to implementations without the MOO mechanism. This paper proposes a novel approach to integrate MOO algorithms, independent of hard-parameter sharing architectures, and effectively optimizes multiple NTMs loss functions. Comprehensive evaluations on widely used benchmark datasets demonstrate that our approach significantly enhances baseline topic model performance and outperforms direct MOO applications on NTMs.

Cross-lingual topic modeling aims to uncover shared semantic themes across languages. Several methods have been proposed to address this problem, leveraging both traditional and neural approaches. While previous methods have achieved some improvements in topic diversity, they often struggle to ensure high topic coherence and consistent alignment across languages. We propose XTRA (Cross-Lingual Topic Modeling with Topic and Representation Alignments), a novel framework that unifies Bag-of-Words modeling with multilingual embeddings. XTRA introduces two core components: (1) representation alignment, aligning document-topic distributions via contrastive learning in a shared semantic space; and (2) topic alignment, projecting topic-word distributions into the same space to enforce cross-lingual consistency. This dual mechanism enables XTRA to learn topics that are interpretable (coherent and diverse) and well-aligned across languages. Experiments on multilingual corpora confirm that XTRA significantly outperforms strong baselines in topic coherence, diversity, and alignment quality.

pdf bib abs
Mutual-pairing Data Augmentation for Fewshot Continual Relation Extraction
Nguyen Hoang Anh | Quyen Tran | Thanh Xuan Nguyen | Nguyen Thi Ngoc Diep | Linh Ngo Van | Thien Huu Nguyen | Trung Le
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Data scarcity is a major challenge in Few-shot Continual Relation Extraction (FCRE), where models must learn new relations from limited data while retaining past knowledge. Current methods, restricted by minimal data streams, struggle with catastrophic forgetting and overfitting. To overcome this, we introduce a novel *data augmentation strategy* that transforms single input sentences into complex texts by integrating both old and new data. Our approach sharpens model focus, enabling precise identification of word relationships based on specified relation types. By embedding adversarial training effects and leveraging new training perspectives through special objective functions, our method enhances model performance significantly. Additionally, we explore Sharpness-Aware Minimization (SAM) in Few-shot Continual Learning. Our extensive experiments uncover fascinating behaviors of SAM across tasks and offer valuable insights for future research in this dynamic field.

2024

Few-shot Continual Relations Extraction (FCRE) is an emerging and dynamic area of study where models can sequentially integrate knowledge from new relations with limited labeled data while circumventing catastrophic forgetting and preserving prior knowledge from pre-trained backbones. In this work, we introduce a novel method that leverages often-discarded language model heads. By employing these components via a mutual information maximization strategy, our approach helps maintain prior knowledge from the pre-trained backbone and strategically aligns the primary classification head, thereby enhancing model performance. Furthermore, we explore the potential of Large Language Models (LLMs), renowned for their wealth of knowledge, in addressing FCRE challenges. Our comprehensive experimental results underscore the efficacy of the proposed method and offer valuable insights for future work.

2020

Interpretability and explainability of deep neural net models are always challenging due to their size and complexity. Many previous works focused on visualizing internal components of neural networks to represent them through human-friendly concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations in the past. Thus, we argue that one potential approach to make the model interpretable and explainable is to design it in a way such that the model explicitly connects the current sample with the seen samples, and bases its decision on these samples. In this work, we design one such model: an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. The model achieves state-of-the-art performance on two popular question answering datasets, the TrecQA dataset and the WikiQA dataset. Via further analysis, we showed that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused this error. We believe that this error-tracing capability might be beneficial in improving dataset quality in many applications.

Co-authors

Venues

Fix author