Larry Heck

2025

UNLEARN Efficient Removal of Knowledge in Large Language Models
Tyler Lizzo | Larry Heck
Findings of the Association for Computational Linguistics: NAACL 2025

Large Language Models (LLMs) excel in many Natural Language Processing tasks but are outperformed by specialized tools for certain tasks. This raises the question: Can we reduce redundant LLM parameters when using these tools? Given the size and high training costs of LLMs, it is essential to efficiently forget specific knowledge without retraining. This paper introduces UNLEARN, a novel method that uses subspace techniques to selectively remove knowledge without access to the original training data, without retraining, and with minimal impact to other tasks. Our results show that UNLEARN significantly outperforms previous methods for forgetting targeted (unwanted) knowledge while also preserving related (wanted) knowledge. We also propose LEARN, a complementary approach for targeted knowledge addition, which achieves fine-tuning accuracy comparable to Low-Rank Adaptation (LoRA) without degrading related task performance.

pdf bib abs

Hidden Forms: A Dataset to Fill Masked Interfaces from Language Commands
Anirudh Sundar | Christopher Richardson | William Gay | Benjamin Reichman | Larry Heck
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)

This paper introduces Hidden Forms (hFORMS), a dataset of natural language commands paired with user interfaces with masked visual context. By obscuring specific UI elements, the dataset challenges Computer-Using Agents to parse natural language instructions and infer the correct bounding box locations by leveraging UI context. Furthermore, hFORMS contains three distinct masking strategies representing progressive difficulty levels. Additionally, we explore parameter-efficient fine-tuning approaches using Vision-Language models from the Llama and Qwen series, demonstrating that fine-tuning on mobile domains results in more than 5x improvement in zero-shot domain adaptation performance when identifying bounding boxes on the desktop and web domains.

pdf bib abs

Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning
Saif Punjwani | Larry Heck
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)

Large language models (LLMs) have demonstrated remarkable reasoning capabilities when prompted with strategies such as Chain-of-Thought (CoT). However, these approaches focus on token-level output without considering internal weight dynamics. We introduce Weight-of-Thought (WoT) reasoning, a novel approach that examines neural network weights before inference to identify reasoning pathways. Unlike existing methods, WoT explores the weight space through graph-based message passing, multi-step reasoning processes, and attention mechanisms. Our implementation creates an interconnected graph of reasoning nodes. Experiments on diverse reasoning tasks (syllogistic, mathematical, algebraic, combinatorial, and geometric) demonstrate that WoT achieves superior performance compared to traditional methods, particularly for complex problems. This approach leads to both improved performance and greater interpretability of the reasoning process, offering a promising direction for enhancing LLM reasoning capabilities.

pdf bib abs

iTBLS: A Dataset of Interactive Conversations Over Tabular Information
Anirudh Sundar | Christopher Richardson | Adar Avsian | Larry Heck
Proceedings of the 4th Table Representation Learning Workshop

This paper introduces Interactive Tables (iTBLS), a dataset of interactive conversations that focuses on natural-language manipulation of tabular information sourced from academic pre-prints on ArXiv. The iTBLS dataset consists of three types of tabular tasks – interpretation, modification, and generation. Interpretation focuses on tabular understanding, modification focuses on manipulating tabular information, and generation focuses on the addition of new natural-language evidence. In addition, the paper presents a novel framework that reformulates tabular operations as question-answering, where an appropriate question is formulated based on the nature of interaction and the question is answered using the user request as evidence. The developed approach results in an improvement on all tasks on a sequence-to-sequence modeling baseline on iTBLS. In addition, the question-answering-based reformulation is applied to datasets from prior work for the text-to-table task where textual paragraphs are summarized into tables. The novel approach results in up to 13% improvement in Exact-Match accuracy and up to 16% improvement in BERTScores compared to the prior state-of-the-art.

2024

pdf bib abs

Dense Passage Retrieval: Is it Retrieving?
Benjamin Reichman | Larry Heck
Findings of the Association for Computational Linguistics: EMNLP 2024

Large Language Models (LLMs) internally store repositories of knowledge. However, their access to this repository is imprecise and they frequently hallucinate information that is not true or does not exist. A paradigm called Retrieval Augmented Generation (RAG) promises to fix these issues. Dense passage retrieval (DPR) is the first step in this paradigm. In this paper, we analyze the role of DPR fine-tuning and how it affects the model being trained. DPR fine-tunes pre-trained networks to enhance the alignment of the embeddings between queries and relevant textual data. We explore DPR-trained models mechanistically by using a combination of probing, layer activation analysis, and model editing. Our experiments show that DPR training decentralizes how knowledge is stored in the network, creating multiple access pathways to the same information. We also uncover a limitation in this training style: the internal knowledge of the pre-trained model bounds what the retrieval model can retrieve. These findings suggest a few possible directions for dense retrieval: (1) expose the DPR training process to more knowledge so more can be decentralized, (2) inject facts as decentralized representations, (3) model and incorporate knowledge uncertainty in the retrieval process, and (4) directly map internal model knowledge to a knowledge base.

pdf bib abs

mForms : Multimodal Form Filling with Question Answering
Larry Heck | Simon Heck | Anirudh Sundar
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents a new approach to form-filling by reformulating the task as multimodal natural language Question Answering (QA). The reformulation is achieved by first translating the elements on the GUI form (text fields, buttons, icons, etc.) to natural language questions, where these questions capture the element’s multimodal semantics. After a match is determined between the form element (Question) and the user utterance (Answer), the form element is filled through a pre-trained extractive QA system. By leveraging pre-trained QA models and not requiring form-specific training, this approach to form-filling is zero-shot. The paper also presents an approach to further refine the form-filling by using multi-task training to incorporate a potentially large number of successive tasks. Finally, the paper introduces a multimodal natural language form-filling dataset Multimodal Forms (mForms), as well as a multimodal extension of the popular ATIS dataset to support future research and experimentation. Results show the new approach not only maintains robust accuracy for sparse training conditions but achieves state-of-the-art F1 of 0.97 on ATIS with approximately 1/10th the training data.

2023

pdf bib abs

cTBLS: Augmenting Large Language Models with Conversational Tables
Anirudh S. Sundar | Larry Heck
Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023)

Optimizing accuracy and performance while eliminating hallucinations of open-domain conversational large language models (LLMs) is an open research challenge. A particularly promising direction is to augment and ground LLMs with information from structured sources. This paper introduces Conversational Tables cTBLS, a three-step architecture to retrieve and generate dialogue responses grounded on retrieved tabular information. cTBLS uses Transformer encoder embeddings for Dense Table Retrieval and obtains up to 125% relative improvement over the retriever in the previous state-of-the-art system on the HyrbiDialogue dataset. cTBLS then uses a shared process between encoder and decoder models to perform a coarse+fine tabular knowledge (e.g., cell) ranking combined with a GPT-3.5 LLM response generator to yield a 2x relative improvement in ROUGE scores. Finally, human evaluators prefer cTBLs +80% of the time (coherency, fluency) and judge informativeness to be 4x better than the previous state-of-the-art.

pdf bib abs

Syndicom: Improving Conversational Commonsense with Error-Injection and Natural Language Feedback
Christopher Richardson | Anirudh Sundar | Larry Heck
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Commonsense reasoning is a critical aspect of human communication. Despite recent advances in conversational AI driven by large language models, commonsense reasoning remains a challenging task. In this work, we introduce Syndicom - a method for improving commonsense in dialogue response generation. Syndicom consists of two components. The first component is a dataset composed of commonsense dialogues created from a knowledge graph and synthesized into natural language. This dataset includes both valid and invalid responses to dialogue contexts, along with natural language feedback (NLF) for the invalid responses. The second contribution is a two-step procedure: training a model to predict natural language feedback (NLF) for invalid responses, and then training a response generation model conditioned on the predicted NLF, the invalid response, and the dialogue. Syndicom is scalable and does not require reinforcement learning. Empirical results on three tasks are evaluated using a broad range of metrics. Syndicom achieves a relative improvement of 53% over ChatGPT on ROUGE-1, and human evaluators prefer Syndicom over ChatGPT 57% of the time. We will publicly release the code and the full dataset.

2022

pdf bib abs

Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh Sundar | Larry Heck
Proceedings of the 4th Workshop on NLP for Conversational AI

As humans, we experience the world with all our senses or modalities (sound, sight, touch, smell, and taste). We use these modalities, particularly sight and touch, to convey and interpret specific meanings. Multimodal expressions are central to conversations; a rich set of modalities amplify and often compensate for each other. A multimodal conversational AI system answers questions, fulfills tasks, and emulates human conversations by understanding and expressing itself via multiple modalities. This paper motivates, defines, and mathematically formulates the multimodal conversational research objective. We provide a taxonomy of research required to solve the objective: multimodal representation, fusion, alignment, translation, and co-learning. We survey state-of-the-art datasets and approaches for each research area and highlight their limiting assumptions. Finally, we identify multimodal co-learning as a promising direction for multimodal conversational AI research.

2021

pdf bib abs

Grounding Open-Domain Instructions to Automate Web Support Tasks
Nancy Xu | Sam Masling | Michael Du | Giovanni Campagna | Larry Heck | James Landay | Monica Lam
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Grounding natural language instructions on the web to perform previously unseen tasks enables accessibility and automation. We introduce a task and dataset to train AI agents from open-domain, step-by-step instructions originally written for people. We build RUSS (Rapid Universal Support Service) to tackle this problem. RUSS consists of two models: First, a BERT-LSTM with pointers parses instructions to WebLang, a domain-specific language we design for grounding natural language on the web. Then, a grounding model retrieves the unique IDs of any webpage elements requested in the WebLang. RUSS may interact with the user through a dialogue (e.g. ask for an address) or execute a web operation (e.g. click a button) inside the web runtime. To augment training, we synthesize natural language instructions mapped to WebLang. Our dataset consists of 80 different customer service problems from help websites, with a total of 741 step-by-step instructions and their corresponding actions. RUSS achieves 76.7% end-to-end accuracy predicting agent actions from single instructions. It outperforms state-of-the-art models that directly map instructions to actions without WebLang. Our user study shows that RUSS is preferred by actual users over web navigation.

2018

pdf bib abs

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Bing Liu | Gokhan Tür | Dilek Hakkani-Tür | Pararth Shah | Larry Heck
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent’s capability in successfully completing a task.

2017

pdf bib abs

Sequential Dialogue Context Modeling for Spoken Language Understanding
Ankur Bapna | Gokhan Tür | Dilek Hakkani-Tür | Larry Heck
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Spoken Language Understanding (SLU) is a key component of goal oriented dialogue systems that would parse user utterances into semantic frame representations. Traditionally SLU does not utilize the dialogue history beyond the previous system turn and contextual ambiguities are resolved by the downstream components. In this paper, we explore novel approaches for modeling dialogue context in a recurrent neural network (RNN) based language understanding system. We propose the Sequential Dialogue Encoder Network, that allows encoding context from the dialogue history in chronological order. We compare the performance of our proposed architecture with two context models, one that uses just the previous turn context and another that encodes dialogue context in a memory network, but loses the order of utterances in the dialogue history. Experiments with a multi-domain dialogue dataset demonstrate that the proposed architecture results in reduced semantic frame error rates.