Raymond Li

2025

Delta-KNN: Improving Demonstration Selection in In-Context Learning for Alzheimer’s Disease Detection
Chuyuan Li | Raymond Li | Thalia S. Field | Giuseppe Carenini
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Alzheimer’s Disease (AD) is a progressive neurodegenerative disorder that leads to dementia, and early intervention can greatly benefit from analyzing linguistic abnormalities. In this work, we explore the potential of Large Language Models as health assistants for AD diagnosis from patient-generated text using in-context learning (ICL), where tasks are defined through a few input-output examples. Empirical results reveal that conventional ICL methods, such as similarity-based selection, perform poorly for AD diagnosis, likely due to the inherent complexity of this task. To address this, we introduce Delta-KNN, a novel demonstration selection strategy that enhances ICL performance. Our method leverages a delta score to assess the relative gains of each training example, coupled with a KNN-based retriever that dynamically selects optimal “representatives” for a given input.Experiments on two AD detection datasets across three models demonstrate that Delta-KNN consistently outperforms existing ICL baselines. Notably, when using the Llama-3.1 model, our approach achieves new state-of-the-art results, surpassing even supervised classifiers.

pdf bib abs

While in-context Learning (ICL) has proven to be an effective technique to improve the performance of Large Language Models (LLMs) in a variety of complex tasks, notably in translating natural language questions into Structured Query Language (NL2SQL), the question of how to select the most beneficial demonstration examples remains an open research problem. While prior works often adapted off-the-shelf encoders to retrieve examples dynamically, an inherent discrepancy exists in the representational capacities between the external retrievers and the LLMs. Further, optimizing the selection of examples is a non-trivial task, since there are no straightforward methods to assess the relative benefits of examples without performing pairwise inference. To address these shortcomings, we propose Detriever, a novel demonstration retrieval framework that learns a weighted combination of LLM hidden states, where rich semantic information is encoded. To train the model, we propose a proxy score that estimates the relative benefits of examples based on the similarities between output queries. Experiments on two popular NL2SQL benchmarks demonstrate that our method significantly outperforms the state-of-the-art baselines for the NL2SQL tasks.

pdf bib abs

CEMTM: Contextual Embedding-based Multimodal Topic Modeling
Amirhossein Abaskohi | Raymond Li | Chuyuan Li | Shafiq Joty | Giuseppe Carenini
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We introduce CEMTM, a context-enhanced multimodal topic model designed to infer coherent and interpretable topic structures from both short and long documents containing text and images. CEMTM builds on fine-tuned large vision language models (LVLMs) to obtain contextualized embeddings, and employs a distributional attention mechanism to weight token-level contributions to topic inference. A reconstruction objective aligns topic-based representations with the document embedding, encouraging semantic consistency across modalities. Unlike existing approaches, CEMTM can process multiple images per document without repeated encoding and maintains interpretability through explicit word-topic and document-topic distributions. Extensive experiments on six multimodal benchmarks show that CEMTM consistently outperforms unimodal and multimodal baselines, achieving a remarkable average LLM score of 2.61. Further analysis shows its effectiveness in downstream few-shot retrieval and its ability to capture visually grounded semantics in complex domains such as scientific articles.

pdf bib abs

Explicit Bayesian Inference to Uncover the Latent Themes of Large Language Models
Raymond Li | Chuyuan Li | Gabriel Murray | Giuseppe Carenini
Findings of the Association for Computational Linguistics: ACL 2025

Large language models (LLMs) have demonstrated impressive generative capabilities, yet their inner mechanisms remain largely opaque. In this work, we introduce a novel approach to interpret LLMs generation process through the lens of an explicit Bayesian framework by inferring latent topic variables via variational inference. Specifically, we leverage a variational autoencoder-based neural topic model to dynamically approximate the posterior distribution over the high-level latent topic variables at each generation step. By reconstructing the LLM’s next-token predictions through these latent topics and maintaining a regularized latent space, our method yields interpretable and diverse topic representations but also has the ability to effectively captures semantic shifts throughout the text. We validate our approach on multiple datasets, showing that our latent topics outperform state-of-the-art topic models on intrinsic measures of coherence and diversity. Furthermore, we demonstrate the utility of our approach in downstream applications by using the inferred topic distributions to retrieve relevant demonstration examples for in-context learning, resulting in significant gains on classification and summarization tasks.

pdf bib abs

Hierarchical Attention Adapter for Abstractive Dialogue Summarization
Raymond Li | Chuyuan Li | Gabriel Murray | Giuseppe Carenini
Proceedings of The 5th New Frontiers in Summarization Workshop

Dialogue summarization is still a very challenging task even for large language models (LLMs). On the one hand, some previous approaches have pre-trained language models specifically for dialogue understanding and summarization, but they have been limited to relatively small models. On the other hand, other works have tried to directly exploit the dialogue semantics and discourse structures in their modeling effort, but by construction, they require access to those structures, which is in itself a largely unsolved problem. In this paper, we synergistically combine these two ideas in an approach that can be seamlessly integrated into the decoder-only architecture adopted by the most state-of-the-art LLMs. In particular, our novel solution leverages the parameter-efficient fine-tuning (PEFT) paradigm to model the hierarchical structure of dialogues, where input sequences are naturally segmented into dialogue turns, and then fine-tune the model for abstractive summarization. From experiments on two datasets, we find that Hierarchical Attention Adapter outperforms all baseline adapter methods on SummScreen, where our approach can also be combined with LoRA to achieve the best performance on SamSum.

pdf bib abs

Recent advances in test-time scaling have shown promising results in improving large language model performance through strategic computation allocation during inference. While this approach has demonstrated strong improvements in reasoning tasks, its application to natural language generation tasks, particularly summarization, remains unexplored.Among all of the generation tasks, multi-document summarization (MDS) presents unique challenges by requiring models to extract and synthesize essential information across multiple lengthy documents. Unlike reasoning tasks, MDS demands a more complicated approach to prompt design and ensemble methods, as no single “best-overall” prompt can satisfy diverse summarization requirements. The inherent diversity in summarization needs necessitates exploring how different prompting strategies can be systematically combined to improve performance.We propose a novel framework that harnesses prompt diversity to enhance MDS performance. Our approach generates multiple candidate summaries using carefully designed prompt variations, then ensemble them through sophisticated aggregation methods to produce refined summaries. This prompt diversity enables models to capture different aspects and perspectives of the source documents, leading to more comprehensive and higher-quality summaries. To evaluate our method effectively, we also introduce two new LLM-based metrics: the Preference Alignment Score (PAS) and LLM Atom-Content-Unit score (LLM-ACU), which assess summary quality while addressing the positional bias inherent in automatic evaluations performed by LLMs.Our experiments demonstrate that leveraging prompt diversity significantly enhances summary quality, while also revealing the practical scaling boundaries for MDS tasks.

2023

pdf bib abs

Diversity-Aware Coherence Loss for Improving Neural Topic Models
Raymond Li | Felipe Gonzalez-Pizarro | Linzi Xing | Gabriel Murray | Giuseppe Carenini
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

The standard approach for neural topic modeling uses a variational autoencoder (VAE) framework that jointly minimizes the KL divergence between the estimated posterior and prior, in addition to the reconstruction loss. Since neural topic models are trained by recreating individual input documents, they do not explicitly capture the coherence between words on the corpus level. In this work, we propose a novel diversity-aware coherence loss that encourages the model to learn corpus-level coherence scores while maintaining high diversity between topics. Experimental results on multiple datasets show that our method significantly improves the performance of neural topic models without requiring any pretraining or additional parameters.

pdf bib abs

Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models
Raymond Li | Gabriel Murray | Giuseppe Carenini
Findings of the Association for Computational Linguistics: EMNLP 2023

In this work, we propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models in the parameter-efficient fine-tuning (PEFT) setting. In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture, where Gumbel-Softmax gates are used to determine the importance of these modules at each layer of the model. To reduce the number of parameters, we first train the model for a fixed small number of steps before pruning the experts based on their important scores. Our experiment results with three different pre-trained models show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters. In addition, we provide additional analysis to examine the experts selected by each model at each layer to provide insights for future studies.

2022

pdf bib abs

Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation
Raymond Li | Wen Xiao | Linzi Xing | Lanjun Wang | Gabriel Murray | Giuseppe Carenini
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

The multi-head self-attention mechanism of the transformer model has been thoroughly investigated recently. In one vein of study, researchers are interested in understanding why and how transformers work. In another vein, researchers propose new attention augmentation methods to make transformers more accurate, efficient and interpretable. In this paper, we combine these two lines of research in a human-in-the-loop pipeline to first discover important task-specific attention patterns. Then those patterns are injected, not only to smaller models, but also to the original model. The benefits of our pipeline and discovered patterns are demonstrated in two case studies with extractive summarization and topic segmentation. After discovering interpretable patterns in BERT-based models fine-tuned for the two downstream tasks, experiments indicate that when we inject the patterns into attention heads, the models show considerable improvements in accuracy and efficiency.

2021

pdf bib abs

T3-Vis: visual analytic for Training and fine-Tuning Transformers in NLP
Raymond Li | Wen Xiao | Lanjun Wang | Hyeju Jang | Giuseppe Carenini
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Transformers are the dominant architecture in NLP, but their training and fine-tuning is still very challenging. In this paper, we present the design and implementation of a visual analytic framework for assisting researchers in such process, by providing them with valuable insights about the model’s intrinsic properties and behaviours. Our framework offers an intuitive overview that allows the user to explore different facets of the model (e.g., hidden states, attention) through interactive visualization, and allows a suite of built-in algorithms that compute the importance of model components and different parts of the input sequence. Case studies and feedback from a user focus group indicate that the framework is useful, and suggest several improvements. Our framework is available at: https://github.com/raymondzmc/T3-Vis.

pdf bib abs

DuoRAT: Towards Simpler Text-to-SQL Models
Torsten Scholak | Raymond Li | Dzmitry Bahdanau | Harm de Vries | Chris Pal
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent neural text-to-SQL models can effectively translate natural language questions to corresponding SQL queries on unseen databases. Working mostly on the Spider dataset, researchers have proposed increasingly sophisticated solutions to the problem. Contrary to this trend, in this paper we focus on simplifications. We begin by building DuoRAT, a re-implementation of the state-of-the-art RAT-SQL model that unlike RAT-SQL is using only relation-aware or vanilla transformers as the building blocks. We perform several ablation experiments using DuoRAT as the baseline model. Our experiments confirm the usefulness of some techniques and point out the redundancy of others, including structural SQL features and features that link the question with the schema.

2020

pdf bib abs

On Extractive and Abstractive Neural Document Summarization with Transformer Language Models
Jonathan Pilault | Raymond Li | Sandeep Subramanian | Chris Pal
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present a method to produce abstractive summaries of long documents that exceed several thousand words via neural abstractive summarization. We perform a simple extractive step before generating a summary, which is then used to condition the transformer language model on relevant information before being tasked with generating a summary. We also show that this approach produces more abstractive summaries compared to prior work that employs a copy mechanism while still achieving higher ROUGE scores. We provide extensive comparisons with strong baseline methods, prior state of the art work as well as multiple variants of our approach including those using only transformers, only extractive techniques and combinations of the two. We examine these models using four different summarization tasks and datasets: arXiv papers, PubMed papers, the Newsroom and BigPatent datasets. We find that transformer based methods produce summaries with fewer n-gram copies, leading to n-gram copying statistics that are more similar to human generated abstracts. We include a human evaluation, finding that transformers are ranked highly for coherence and fluency, but purely extractive methods score higher for informativeness and relevance. We hope that these architectures and experiments may serve as strong points of comparison for future work. Note: The abstract above was collaboratively written by the authors and one of the models presented in this paper based on an earlier draft of this paper.

Co-authors

Venues

COLING1

NAACL1

Fix author