Jake W. Vincent
2025
Controllable Conversational Theme Detection Track at DSTC 12
Igor Shalyminov
|
Hang Su
|
Jake W. Vincent
|
Siffi Singh
|
Jason Cai
|
James Gung
|
Raphael Shu
|
Saab Mansour
Proceedings of the Twelfth Dialog System Technology Challenge
Conversational analytics has been on the forefront of transformation driven by the advances in Speech and Natural Language Processing techniques. Rapid adoption of Large Language Models (LLMs) in the analytics field has taken the problems that can be automated to a new level of complexity and scale.In this paper, we introduce Theme Detection as a critical task in conversational analytics, aimed at automatically identifying and categorizing topics within conversations. This process can significantly reduce the manual effort involved in analyzing expansive dialogs, particularly in domains like customer support or sales. Unlike traditional dialog intent detection, which often relies on a fixed set of intents for downstream system logic, themes are intended as a direct, user-facing summary of the conversation’s core inquiry. This distinction allows for greater flexibility in theme surface forms and user-specific customizations.We pose Controllable Conversational Theme Detection problem as a public competition track at Dialog System Technology Challenge (DSTC) 12 — it is framed as joint clustering and theme labeling of dialog utterances, with the distinctive aspect being controllability of the resulting theme clusters’ granularity achieved via the provided user preference data.We give an overview of the problem, the associated dataset and the evaluation metrics, both automatic and human. Finally, we discuss the participant teams’ submissions and provide insights from those. The track materials (data and code) are openly available in the GitHub repository.
Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation
Mahnaz Koupaee
|
Jake W. Vincent
|
Saab Mansour
|
Igor Shalyminov
|
Han He
|
Hwanjun Song
|
Raphael Shu
|
Jianfeng He
|
Yi Nian
|
Amy Wing-mei Wong
|
Kyu J. Han
|
Hang Su
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Faithfulness evaluators based on Large Language Models (LLMs) are often fooled by the fluency of the text and struggle with identifying errors in the summaries, usually leading to high false negative rate. We propose an approach to summary faithfulness evaluation in which multiple LLM-based agents are assigned initial stances (regardless of what their belief might be) and forced to come up with a reason to justify the imposed belief, thus engaging in a multi-round debate to reach an agreement. The uniformly distributed initial assignments here result in a greater diversity of stances leading to more meaningful debates and ultimately more errors identified. Furthermore, by analyzing the recent faithfulness evaluation datasets, we observe that naturally, it is not always the case for a summary to be either faithful to the source document or not. We therefore introduce a new dimension ambiguity and a detailed taxonomy to identify such special cases. Experiments demonstrate our approach can help identify ambiguities, and have even a stronger performance on non-ambiguous summaries.
Search
Fix author
Co-authors
- Saab Mansour 2
- Igor Shalyminov 2
- Raphael Shu 2
- Hang Su 2
- Jason Cai 1
- show all...