Naoto Iwahashi


2016

pdf bib
Comparison of Grapheme-to-Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary
Ye Kyaw Thu | Win Pa Pa | Yoshinori Sagisaka | Naoto Iwahashi
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)

Grapheme-to-Phoneme (G2P) conversion is the task of predicting the pronunciation of a word given its graphemic or written form. It is a highly important part of both automatic speech recognition (ASR) and text-to-speech (TTS) systems. In this paper, we evaluate seven G2P conversion approaches: Adaptive Regularization of Weight Vectors (AROW) based structured learning (S-AROW), Conditional Random Field (CRF), Joint-sequence models (JSM), phrase-based statistical machine translation (PBSMT), Recurrent Neural Network (RNN), Support Vector Machine (SVM) based point-wise classification, Weighted Finite-state Transducers (WFST) on a manually tagged Myanmar phoneme dictionary. The G2P bootstrapping experimental results were measured with both automatic phoneme error rate (PER) calculation and also manual checking in terms of voiced/unvoiced, tones, consonant and vowel errors. The result shows that CRF, PBSMT and WFST approaches are the best performing methods for G2P conversion on Myanmar language.

2006

pdf bib
Multimedia Database of Meetings and Informal Interactions for Tracking Participant Involvement and Discourse Flow
Nick Campbell | Toshiyuki Sadanobu | Masataka Imura | Naoto Iwahashi | Suzuki Noriko | Damien Douxchamps
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

At ATR, we are collecting and analysing “meetings” data using a table-top sensor device consisting of a small 360-degree camera surrounded by an array of high-quality directional microphones. This equipment provides a stream of information about the audio and visual events of the meeting which is then processed to form a representation of the verbal and non-verbal interpersonal activity, or discourse flow, during the meeting. This paper describes the resulting corpus of speech and video data which is being collected for the abovere search. It currently includes data from 12 monthly sessions, comprising 71 video and 33 audio modules. Collection is continuingmonthly and is scheduled to include another ten sessions.

2003

pdf bib
A Method for Forming Mutual Beliefs for Communication through Human-robot Multi-modal Interaction
Naoto Iwahashi
Proceedings of the Fourth SIGdial Workshop of Discourse and Dialogue