Trung Le
2024
Preserving Generalization of Language models in Few-shot Continual Relation Extraction
Quyen Tran
|
Nguyen Xuan Thanh
|
Nguyen Hoang Anh
|
Nam Le Hai
|
Trung Le
|
Linh Van Ngo
|
Thien Huu Nguyen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Few-shot Continual Relations Extraction (FCRE) is an emerging and dynamic area of study where models can sequentially integrate knowledge from new relations with limited labeled data while circumventing catastrophic forgetting and preserving prior knowledge from pre-trained backbones. In this work, we introduce a novel method that leverages often-discarded language model heads. By employing these components via a mutual information maximization strategy, our approach helps maintain prior knowledge from the pre-trained backbone and strategically aligns the primary classification head, thereby enhancing model performance. Furthermore, we explore the potential of Large Language Models (LLMs), renowned for their wealth of knowledge, in addressing FCRE challenges. Our comprehensive experimental results underscore the efficacy of the proposed method and offer valuable insights for future work.
2020
Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering
Quan Hung Tran
|
Nhan Dam
|
Tuan Lai
|
Franck Dernoncourt
|
Trung Le
|
Nham Le
|
Dinh Phung
Proceedings of the 28th International Conference on Computational Linguistics
Interpretability and explainability of deep neural net models are always challenging due to their size and complexity. Many previous works focused on visualizing internal components of neural networks to represent them through human-friendly concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations in the past. Thus, we argue that one potential approach to make the model interpretable and explainable is to design it in a way such that the model explicitly connects the current sample with the seen samples, and bases its decision on these samples. In this work, we design one such model: an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. The model achieves state-of-the-art performance on two popular question answering datasets, the TrecQA dataset and the WikiQA dataset. Via further analysis, we showed that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused this error. We believe that this error-tracing capability might be beneficial in improving dataset quality in many applications.
Search
Co-authors
- Quyen Tran 1
- Nguyen Xuan Thanh 1
- Nguyen Hoang Anh 1
- Nam Le Hai 1
- Linh Van Ngo 1
- show all...