Gabriel Murray

2025

Explicit Bayesian Inference to Uncover the Latent Themes of Large Language Models
Raymond Li | Chuyuan Li | Gabriel Murray | Giuseppe Carenini
Findings of the Association for Computational Linguistics: ACL 2025

Large language models (LLMs) have demonstrated impressive generative capabilities, yet their inner mechanisms remain largely opaque. In this work, we introduce a novel approach to interpret LLMs generation process through the lens of an explicit Bayesian framework by inferring latent topic variables via variational inference. Specifically, we leverage a variational autoencoder-based neural topic model to dynamically approximate the posterior distribution over the high-level latent topic variables at each generation step. By reconstructing the LLM’s next-token predictions through these latent topics and maintaining a regularized latent space, our method yields interpretable and diverse topic representations but also has the ability to effectively captures semantic shifts throughout the text. We validate our approach on multiple datasets, showing that our latent topics outperform state-of-the-art topic models on intrinsic measures of coherence and diversity. Furthermore, we demonstrate the utility of our approach in downstream applications by using the inferred topic distributions to retrieve relevant demonstration examples for in-context learning, resulting in significant gains on classification and summarization tasks.

pdf bib abs

Hierarchical Attention Adapter for Abstractive Dialogue Summarization
Raymond Li | Chuyuan Li | Gabriel Murray | Giuseppe Carenini
Proceedings of The 5th New Frontiers in Summarization Workshop

Dialogue summarization is still a very challenging task even for large language models (LLMs). On the one hand, some previous approaches have pre-trained language models specifically for dialogue understanding and summarization, but they have been limited to relatively small models. On the other hand, other works have tried to directly exploit the dialogue semantics and discourse structures in their modeling effort, but by construction, they require access to those structures, which is in itself a largely unsolved problem. In this paper, we synergistically combine these two ideas in an approach that can be seamlessly integrated into the decoder-only architecture adopted by the most state-of-the-art LLMs. In particular, our novel solution leverages the parameter-efficient fine-tuning (PEFT) paradigm to model the hierarchical structure of dialogues, where input sequences are naturally segmented into dialogue turns, and then fine-tune the model for abstractive summarization. From experiments on two datasets, we find that Hierarchical Attention Adapter outperforms all baseline adapter methods on SummScreen, where our approach can also be combined with LoRA to achieve the best performance on SamSum.

2023

pdf bib abs

Diversity-Aware Coherence Loss for Improving Neural Topic Models
Raymond Li | Felipe Gonzalez-Pizarro | Linzi Xing | Gabriel Murray | Giuseppe Carenini
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

The standard approach for neural topic modeling uses a variational autoencoder (VAE) framework that jointly minimizes the KL divergence between the estimated posterior and prior, in addition to the reconstruction loss. Since neural topic models are trained by recreating individual input documents, they do not explicitly capture the coherence between words on the corpus level. In this work, we propose a novel diversity-aware coherence loss that encourages the model to learn corpus-level coherence scores while maintaining high diversity between topics. Experimental results on multiple datasets show that our method significantly improves the performance of neural topic models without requiring any pretraining or additional parameters.

pdf bib abs

Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models
Raymond Li | Gabriel Murray | Giuseppe Carenini
Findings of the Association for Computational Linguistics: EMNLP 2023

In this work, we propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models in the parameter-efficient fine-tuning (PEFT) setting. In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture, where Gumbel-Softmax gates are used to determine the importance of these modules at each layer of the model. To reduce the number of parameters, we first train the model for a fixed small number of steps before pruning the experts based on their important scores. Our experiment results with three different pre-trained models show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters. In addition, we provide additional analysis to examine the experts selected by each model at each layer to provide insights for future studies.

2022

pdf bib abs

Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation
Raymond Li | Wen Xiao | Linzi Xing | Lanjun Wang | Gabriel Murray | Giuseppe Carenini
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

The multi-head self-attention mechanism of the transformer model has been thoroughly investigated recently. In one vein of study, researchers are interested in understanding why and how transformers work. In another vein, researchers propose new attention augmentation methods to make transformers more accurate, efficient and interpretable. In this paper, we combine these two lines of research in a human-in-the-loop pipeline to first discover important task-specific attention patterns. Then those patterns are injected, not only to smaller models, but also to the original model. The benefits of our pipeline and discovered patterns are demonstrated in two case studies with extractive summarization and topic segmentation. After discovering interpretable patterns in BERT-based models fine-tuned for the two downstream tasks, experiments indicate that when we inject the patterns into attention heads, the models show considerable improvements in accuracy and efficiency.

2019

pdf bib abs

Discourse Analysis and Its Applications
Shafiq Joty | Giuseppe Carenini | Raymond Ng | Gabriel Murray
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

Discourse processing is a suite of Natural Language Processing (NLP) tasks to uncover linguistic structures from texts at several levels, which can support many downstream applications. This involves identifying the topic structure, the coherence structure, the coreference structure, and the conversation structure for conversational discourse. Taken together, these structures can inform text summarization, machine translation, essay scoring, sentiment analysis, information extraction, question answering, and thread recovery. The tutorial starts with an overview of basic concepts in discourse analysis – monologue vs. conversation, synchronous vs. asynchronous conversation, and key linguistic structures in discourse analysis. We also give an overview of linguistic structures and corresponding discourse analysis tasks that discourse researchers are generally interested in, as well as key applications on which these discourse structures have an impact.

2018

pdf bib

NLP for Conversations: Sentiment, Summarization, and Group Dynamics
Gabriel Murray | Giuseppe Carenini | Shafiq Joty
Proceedings of the 27th International Conference on Computational Linguistics: Tutorial Abstracts

pdf bib abs

Language-Based Automatic Assessment of Cognitive and Communicative Functions Related to Parkinson’s Disease
Lesley Jessiman | Gabriel Murray | McKenzie Braley
Proceedings of the First International Workshop on Language Cognition and Computational Models

We explore the use of natural language processing and machine learning for detecting evidence of Parkinson’s disease from transcribed speech of subjects who are describing everyday tasks. Experiments reveal the difficulty of treating this as a binary classification task, and a multi-class approach yields superior results. We also show that these models can be used to predict cognitive abilities across all subjects.

2017

pdf bib abs

Detecting Dementia through Retrospective Analysis of Routine Blog Posts by Bloggers with Dementia
Vaden Masrani | Gabriel Murray | Thalia Field | Giuseppe Carenini
Proceedings of the 16th BioNLP Workshop

We investigate if writers with dementia can be automatically distinguished from those without by analyzing linguistic markers in written text, in the form of blog posts. We have built a corpus of several thousand blog posts, some by people with dementia and others by people with loved ones with dementia. We use this dataset to train and test several machine learning methods, and achieve prediction performance at a level far above the baseline.

pdf bib abs

Modelling Participation in Small Group Social Sequences with Markov Rewards Analysis
Gabriel Murray
Proceedings of the Second Workshop on NLP and Computational Social Science

We explore a novel computational approach for analyzing member participation in small group social sequences. Using a complex state representation combining information about dialogue act types, sentiment expression, and participant roles, we explore which sequence states are associated with high levels of member participation. Using a Markov Rewards framework, we associate particular states with immediate positive and negative rewards, and employ a Value Iteration algorithm to calculate the expected value of all states. In our findings, we focus on discourse states belonging to team leaders and project managers which are either very likely or very unlikely to lead to participation from the rest of the group members.

Co-authors

Venues

Gabriel Murray

2025

2023

2022

2019

2018

2017

2012

2010

2009

2008

2007

2006

2005

Co-authors

Venues