Ensuring that Large Language Models (LLMs) generate text representative of diverse sub-populations is essential, particularly when key concepts related to under-represented groups are scarce in the training data. We address this challenge with a novel clustering-based active learning framework, enhanced with knowledge distillation. The proposed framework transforms the intermediate outputs of the learner model, enabling effective active learning for generative tasks for the first time. Integration of clustering and knowledge distillation yields more representative models without prior knowledge of underlying data distribution and overbearing human efforts. We validate our approach in practice through case studies in counter-narration and style transfer. We construct two new datasets in tandem with model training, showing a performance improvement of 2%–10% over baseline models. Our results also show more consistent performance across various data subgroups and increased lexical diversity, underscoring our model’s resilience to skewness in available data. Further, our results show that the data acquired via our approach improves the performance of secondary models not involved in the learning loop, showcasing practical utility of the framework.
General-purpose automatic speech recognition (ASR) systems do not always perform well in goal-oriented dialogue. Existing ASR correction methods rely on prior user data or named entities. We extend correction to tasks that have no prior user data and exhibit linguistic flexibility such as lexical and syntactic variations. We propose a novel context augmentation with a large language model and a ranking strategy that incorporates contextual information from the dialogue states of a goal-oriented conversational AI and its tasks. Our method ranks (1) n-best ASR hypotheses by their lexical and semantic similarity with context and (2) context by phonetic correspondence with ASR hypotheses. Evaluated in home improvement and cooking domains with real-world users, our method improves recall and F1 of correction by 34% and 16%, respectively, while maintaining precision and false positive rate. Users rated .8-1 point (out of 5) higher when our correction method worked properly, with no decrease due to false positives.
Large language models (LLMs) are capable of generating well-formed responses, but using LLMs to generate responses on the fly is not yet feasible for many task-oriented systems. Modular architectures are often still required for safety and privacy guarantees on the output. We hypothesize that an offline generation approach using discourse theories, formal grammar rules, and LLMs can allow us to generate human-like, coherent text in a more efficient, robust, and inclusive manner within a task-oriented setting. To this end, we present the first discourse-aware multimodal task-oriented dialogue system that combines discourse theories with offline LLM generation. We deploy our bot as an app to the general public and keep track of the user ratings for six months. Our user ratings show an improvement from 2.8 to 3.5 out of 5 with the introduction of discourse coherence theories. We also show that our model reduces misunderstandings in the dialect of African-American Vernacular English from 93% to 57%. While terms of use prevent us from releasing our entire codebase, we release our code in a format that can be integrated into most existing dialogue systems.