Teruhisa Misu


2024

pdf bib
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
Kaiwen Zhou | Kwonjoon Lee | Teruhisa Misu | Xin Wang
Findings of the Association for Computational Linguistics: ACL 2024

In our work, we explore the synergistic capabilities of pre-trained vision-and-language models (VLMs) and large language models (LLMs) on visual commonsense reasoning (VCR) problems. We find that VLMs and LLMs-based decision pipelines are good at different kinds of VCR problems. Pre-trained VLMs exhibit strong performance for problems involving understanding the literal visual content, which we noted as visual commonsense understanding (VCU). For problems where the goal is to infer conclusions beyond image content, which we noted as visual commonsense inference (VCI), VLMs face difficulties, while LLMs, given sufficient visual evidence, can use commonsense to infer the answer well. We empirically validate this by letting LLMs classify VCR problems into these two categories and show the significant difference between VLM and LLM with image caption decision pipelines on two subproblems. Moreover, we identify a challenge with VLMs’ passive perception, which may miss crucial context information, leading to incorrect reasoning by LLMs. Based on these, we suggest a collaborative approach, named ViCor, where pre-trained LLMs serve as problem classifiers to analyze the problem category, then either use VLMs to answer the question directly or actively instruct VLMs to concentrate on and gather relevant visual elements to support potential commonsense inferences. We evaluate our framework on two VCR benchmark datasets and outperform all other methods without in-domain fine-tuning.

pdf bib
GesNavi: Gesture-guided Outdoor Vision-and-Language Navigation
Aman Jain | Teruhisa Misu | Kentaro Yamada | Hitomi Yanaka
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Vision-and-Language Navigation (VLN) task involves navigating mobility using linguistic commands and has application in developing interfaces for autonomous mobility. In reality, natural human communication also encompasses non-verbal cues like hand gestures and gaze. These gesture-guided instructions have been explored in Human-Robot Interaction systems for effective interaction, particularly in object-referring expressions. However, a notable gap exists in tackling gesture-based demonstrative expressions in outdoor VLN task. To address this, we introduce a novel dataset for gesture-guided outdoor VLN instructions with demonstrative expressions, designed with a focus on complex instructions requiring multi-hop reasoning between the multiple input modalities. In addition, our work also includes a comprehensive analysis of the collected data and a comparative evaluation against the existing datasets.

2014

pdf bib
Situated Language Understanding at 25 Miles per Hour
Teruhisa Misu | Antoine Raux | Rakesh Gupta | Ian Lane
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

2012

pdf bib
Reinforcement Learning of Question-Answering Dialogue Policies for Virtual Museum Guides
Teruhisa Misu | Kallirroi Georgila | Anton Leuski | David Traum
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2011

pdf bib
Toward Construction of Spoken Dialogue System that Evokes Users’ Spontaneous Backchannels
Teruhisa Misu | Etsuo Mizukami | Yoshinori Shiga | Shinichi Kawamoto | Hisashi Kawai | Satoshi Nakamura
Proceedings of the SIGDIAL 2011 Conference

pdf bib
Similarity Based Language Model Construction for Voice Activated Open-Domain Question Answering
István Varga | Kiyonori Ohtake | Kentaro Torisawa | Stijn De Saeger | Teruhisa Misu | Shigeki Matsuda | Jun’ichi Kazama
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Modeling Spoken Decision Making Dialogue and Optimization of its Dialogue Strategy
Teruhisa Misu | Komei Sugiura | Kiyonori Ohtake | Chiori Hori | Hideki Kashioka | Hisashi Kawai | Satoshi Nakamura
Proceedings of the SIGDIAL 2010 Conference

pdf bib
Dialogue Acts Annotation for NICT Kyoto Tour Dialogue Corpus to Construct Statistical Dialogue Systems
Kiyonori Ohtake | Teruhisa Misu | Chiori Hori | Hideki Kashioka | Satoshi Nakamura
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper introduces a new corpus of consulting dialogues designed for training a dialogue manager that can handle consulting dialogues through spontaneous interactions from the tagged dialogue corpus. We have collected more than 150 hours of consulting dialogues in the tourist guidance domain. We are developing the corpus that consists of speech, transcripts, speech act (SA) tags, morphological analysis results, dependency analysis results, and semantic content tags. This paper outlines our taxonomy of dialogue act (DA) annotation that can describe two aspects of an utterance: the communicative function (SA), and the semantic content of the utterance. We provide an overview of the Kyoto tour dialogue corpus and a preliminary analysis using the DA tags. We also show a result of a preliminary experiment for SA tagging via Support Vector Machines (SVMs). We introduce the current states of the corpus development In addition, we mention the usage of our corpus for the spoken dialogue system that is being developed.

2009

pdf bib
Annotating Dialogue Acts to Construct Dialogue Systems for Consulting
Kiyonori Ohtake | Teruhisa Misu | Chiori Hori | Hideki Kashioka | Satoshi Nakamura
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

2008

pdf bib
Bayes Risk-based Dialogue Management for Document Retrieval System with Speech Interface
Teruhisa Misu | Tatsuya Kawahara
Coling 2008: Companion volume: Posters

2005

pdf bib
Speech-based Information Retrieval System with Clarification Dialogue Strategy
Teruhisa Misu | Tatsuya Kawahara
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Efficient Confirmation Strategy for Large-scale Text Retrieval Systems with Spoken Dialogue Interface
Kazunori Komatani | Teruhisa Misu | Tatsuya Kawahara | Hiroshi G. Okuno
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Dialog Navigator : A Spoken Dialog Q-A System based on Large Text Knowledge Base
Yoji Kiyota | Sadao Kurohashi | Teruhisa Misu | Kazunori Komatani | Tatsuya Kawahara | Fuyuko Kido
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics