Ken Fukuda

2026

Evidential Semantic Entropy for LLM Uncertainty Quantification
Lucie Kunitomo-Jacquin | Edison Marrese-Taylor | Ken Fukuda | Masahiro Hamasaki
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Quantifying uncertainty in large language models (LLMs) is crucial for applications where safety is a concern, as it helps identify factually incorrect LLM answers, commonly referred to as hallucinations. Recently, advancements have been made in quantifying uncertainty, specifically by incorporating the semantics of sampled answers to estimate entropy. These methods typically rely on a normalized probability that is calculated using a limited number of sampled answers. However, we note these estimation methods fail to account for the effects of the semantics that are possible to be obtained as answers, but are not observed in the sample. This is a significant oversight, since a heavier tail of unobserved answer probabilities indicates a higher level of overall uncertainty. To alleviate this issue, we propose Evidential Semantic Entropy (EVSE), which leverages evidence theory to represent both total ignorance arising from unobserved answers and partial ignorance stemming from the semantic relationships among the observed answers. Experiments show that EVSE significantly improves uncertainty quantification performance. Our code is available at: https://github.com/lucieK-J/EvidentialSemanticEntropy.git.

2025

pdf bib abs

ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding
Kimihiro Hasegawa | Wiradee Imrattanatrai | Zhi-Qi Cheng | Masaki Asada | Susan Holm | Yuran Wang | Ken Fukuda | Teruko Mitamura
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Multimodal systems have great potential to assist humans in procedural activities, where people follow instructions to achieve their goals. Despite diverse application scenarios, systems are typically evaluated on traditional classification tasks, e.g., action recognition or temporal action localization. In this paper, we present a novel evaluation dataset, ProMQA, to measure the advancement of systems in application-oriented scenarios. ProMQA consists of 401 multimodal procedural QA pairs on user recording of procedural activities, i.e., cooking, coupled with their corresponding instruction. For QA annotation, we take a cost-effective human-LLM collaborative approach, where the existing annotation is augmented with LLM-generated QA pairs that are later verified by humans. We then provide the benchmark results to set the baseline performance on ProMQA. Our experiment reveals a significant gap between human performance and that of current systems, including competitive proprietary multimodal models. We hope our dataset sheds light on new aspects of models’ multimodal understanding capabilities.

pdf bib abs

On the Role of Unobserved Sequences on Sample-based Uncertainty Quantification for LLMs
Lucie Kunitomo-Jacquin | Edison Marrese-Taylor | Ken Fukuda
Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)

Quantifying uncertainty in large language models (LLMs) is important for safety-critical applications because it helps spot incorrect answers, known as hallucinations. One major trend of uncertainty quantification methods is based on estimating the entropy of the distribution of the LLM’s potential output sequences. This estimation is based on a set of output sequences and associated probabilities obtained by querying the LLM several times. In this paper, we advocate and experimentally and show that the probability of unobserved sequences plays a crucial role, and we recommend future research to integrate it to enhance such LLM uncertainty quantification methods.

2023

pdf bib abs

End-to-End Task-Oriented Dialogue Systems Based on Schema
Wiradee Imrattanatrai | Ken Fukuda
Findings of the Association for Computational Linguistics: ACL 2023

This paper presents a schema-aware end-to-end neural network model for handling task-oriented dialogues based on a dynamic set of slots within a schema. Contrary to existing studies that proposed end-to-end approaches for task-oriented dialogue systems by relying on a unified schema across domains, we design our approach to support a domain covering multiple services where diverse schemas are available. To enable better generalizability among services and domains with different schemas, we supply the schema’s context information including slot descriptions and value constraints to the model. The experimental results on a well-known Schema-Guided Dialogue (SGD) dataset demonstrated the performance improvement by the proposed model compared to state-of-the-art baselines in terms of end-to-end modeling, dialogue state tracking task, and generalization on new services and domains using a limited number of dialogues.

Co-authors

Venues

Fix author