Satoshi Nakamura


2021

pdf bib
ARTA: Collection and Classification of Ambiguous Requests and Thoughtful Actions
Shohei Tanaka | Koichiro Yoshino | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Human-assisting systems such as dialogue systems must take thoughtful, appropriate actions not only for clear and unambiguous user requests, but also for ambiguous user requests, even if the users themselves are not aware of their potential requirements. To construct such a dialogue agent, we collected a corpus and developed a model that classifies ambiguous user requests into corresponding system actions. In order to collect a high-quality corpus, we asked workers to input antecedent user requests whose pre-defined actions could be regarded as thoughtful. Although multiple actions could be identified as thoughtful for a single user request, annotating all combinations of user requests and system actions is impractical. For this reason, we fully annotated only the test data and left the annotation of the training data incomplete. In order to train the classification model on such training data, we applied the positive/unlabeled (PU) learning method, which assumes that only a part of the data is labeled with positive examples. The experimental results show that the PU learning method achieved better performance than the general positive/negative (PN) learning method to classify thoughtful actions given an ambiguous user request.

pdf bib
FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN
Antonios Anastasopoulos | Ondřej Bojar | Jacob Bremerman | Roldano Cattoni | Maha Elbayad | Marcello Federico | Xutai Ma | Satoshi Nakamura | Matteo Negri | Jan Niehues | Juan Pino | Elizabeth Salesky | Sebastian Stüker | Katsuhito Sudoh | Marco Turchi | Alexander Waibel | Changhan Wang | Matthew Wiesner
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation. A total of 22 teams participated in at least one of the tasks. This paper describes each shared task, data and evaluation metrics, and reports results of the received submissions.

pdf bib
NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task
Ryo Fukuda | Yui Oka | Yasumasa Kano | Yuki Yano | Yuka Ko | Hirotaka Tokuyama | Kosuke Doi | Sakriani Sakti | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes NAIST’s system for the English-to-Japanese Simultaneous Text-to-text Translation Task in IWSLT 2021 Evaluation Campaign. Our primary submission is based on wait-k neural machine translation with sequence-level knowledge distillation to encourage literal translation.

pdf bib
On Knowledge Distillation for Translating Erroneous Speech Transcriptions
Ryo Fukuda | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

Recent studies argue that knowledge distillation is promising for speech translation (ST) using end-to-end models. In this work, we investigate the effect of knowledge distillation with a cascade ST using automatic speech recognition (ASR) and machine translation (MT) models. We distill knowledge from a teacher model based on human transcripts to a student model based on erroneous transcriptions. Our experimental results demonstrated that knowledge distillation is beneficial for a cascade ST. Further investigation that combined knowledge distillation and fine-tuning revealed that the combination consistently improved two language pairs: English-Italian and Spanish-English.

pdf bib
Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data
Kosuke Doi | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis. A portion of the corpus contains SI data from three interpreters with different amounts of experience. Some of the SI data were manually aligned with the source speeches at the sentence level. Their latency, quality, and word order aspects were compared among the SI data themselves as well as against offline translations. The results showed that (1) interpreters with more experience controlled the latency and quality better, and (2) large latency hurt the SI quality.

pdf bib
Is This Translation Error Critical?: Classification-Based Human and Automatic Machine Translation Evaluation Focusing on Critical Errors
Katsuhito Sudoh | Kosuke Takahashi | Satoshi Nakamura
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)

This paper discusses a classification-based approach to machine translation evaluation, as opposed to a common regression-based approach in the WMT Metrics task. Recent machine translation usually works well but sometimes makes critical errors due to just a few wrong word choices. Our classification-based approach focuses on such errors using several error type labels, for practical machine translation evaluation in an age of neural machine translation. We made additional annotations on the WMT 2015-2017 Metrics datasets with fluency and adequacy labels to distinguish different types of translation errors from syntactic and semantic viewpoints. We present our human evaluation criteria for the corpus development and automatic evaluation experiments using the corpus. The human evaluation corpus will be publicly available upon publication.

2020

pdf bib
Improving Spoken Language Understanding by Wisdom of Crowds
Koichiro Yoshino | Kana Ikeuchi | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 28th International Conference on Computational Linguistics

Spoken language understanding (SLU), which converts user requests in natural language to machine-interpretable expressions, is becoming an essential task. The lack of training data is an important problem, especially for new system tasks, because existing SLU systems are based on statistical approaches. In this paper, we proposed to use two sources of the “wisdom of crowds,” crowdsourcing and knowledge community website, for improving the SLU system. We firstly collected paraphrasing variations for new system tasks through crowdsourcing as seed data, and then augmented them using similar questions from a knowledge community website. We investigated the effects of the proposed data augmentation method in SLU task, even with small seed data. In particular, the proposed architecture augmented more than 120,000 samples to improve SLU accuracies.

pdf bib
Incorporating Noisy Length Constraints into Transformer with Length-aware Positional Encodings
Yui Oka | Katsuki Chousa | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 28th International Conference on Computational Linguistics

Neural Machine Translation often suffers from an under-translation problem due to its limited modeling of output sequence lengths. In this work, we propose a novel approach to training a Transformer model using length constraints based on length-aware positional encoding (PE). Since length constraints with exact target sentence lengths degrade translation performance, we add random noise within a certain window size to the length constraints in the PE during the training. In the inference step, we predict the output lengths using input sequences and a BERT-based length prediction model. Experimental results in an ASPEC English-to-Japanese translation showed the proposed method produced translations with lengths close to the reference ones and outperformed a vanilla Transformer (especially in short sentences) by 3.22 points in BLEU. The average translation results using our length prediction model were also better than another baseline method using input lengths for the length constraints. The proposed noise injection improved robustness for length prediction errors, especially within the window size.

pdf bib
Emotional Speech Corpus for Persuasive Dialogue System
Sara Asai | Koichiro Yoshino | Seitaro Shinagawa | Sakriani Sakti | Satoshi Nakamura
Proceedings of the 12th Language Resources and Evaluation Conference

Expressing emotion is known as an efficient way to persuade one’s dialogue partner to accept one’s claim or proposal. Emotional expression in speech can express the speaker’s emotion more directly than using only emotion expression in the text, which will lead to a more persuasive dialogue. In this paper, we built a speech dialogue corpus in a persuasive scenario that uses emotional expressions to build a persuasive dialogue system with emotional expressions. We extended an existing text dialogue corpus by adding variations of emotional responses to cover different combinations of broad dialogue context and a variety of emotional states by crowd-sourcing. Then, we recorded emotional speech consisting of of collected emotional expressions spoken by a voice actor. The experimental results indicate that the collected emotional expressions with their speeches have higher emotional expressiveness for expressing the system’s emotion to users.

pdf bib
Proceedings of the 17th International Conference on Spoken Language Translation
Marcello Federico | Alex Waibel | Kevin Knight | Satoshi Nakamura | Hermann Ney | Jan Niehues | Sebastian Stüker | Dekai Wu | Joseph Mariani | Francois Yvon
Proceedings of the 17th International Conference on Spoken Language Translation

pdf bib
NAIST’s Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task
Ryo Fukuda | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes NAIST’s NMT system submitted to the IWSLT 2020 conversational speech translation task. We focus on the translation disfluent speech transcripts that include ASR errors and non-grammatical utterances. We tried a domain adaptation method by transferring the styles of out-of-domain data (United Nations Parallel Corpus) to be like in-domain data (Fisher transcripts). Our system results showed that the NMT model with domain adaptation outperformed a baseline. In addition, slight improvement by the style transfer was observed.

pdf bib
Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model
Kosuke Takahashi | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose an automatic evaluation method of machine translation that uses source language sentences regarded as additional pseudo references. The proposed method evaluates a translation hypothesis in a regression model. The model takes the paired source, reference, and hypothesis sentence all together as an input. A pretrained large scale cross-lingual language model encodes the input to sentence-pair vectors, and the model predicts a human evaluation score with those vectors. Our experiments show that our proposed method using Cross-lingual Language Model (XLM) trained with a translation language modeling (TLM) objective achieves a higher correlation with human judgments than a baseline method that uses only hypothesis and reference sentences. Additionally, using source sentences in our proposed method is confirmed to improve the evaluation performance.

pdf bib
Reflection-based Word Attribute Transfer
Yoichi Ishibashi | Katsuhito Sudoh | Koichiro Yoshino | Satoshi Nakamura
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Word embeddings, which often represent such analogic relations as king - man + woman queen, can be used to change a word’s attribute, including its gender. For transferring king into queen in this analogy-based manner, we subtract a difference vector man - woman based on the knowledge that king is male. However, developing such knowledge is very costly for words and attributes. In this work, we propose a novel method for word attribute transfer based on reflection mappings without such an analogy operation. Experimental results show that our proposed method can transfer the word attributes of the given words without changing the words that do not have the target attributes.

pdf bib
Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis
Sashi Novitasari | Andros Tjandra | Sakriani Sakti | Satoshi Nakamura
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

Even though over seven hundred ethnic languages are spoken in Indonesia, the available technology remains limited that could support communication within indigenous communities as well as with people outside the villages. As a result, indigenous communities still face isolation due to cultural barriers; languages continue to disappear. To accelerate communication, speech-to-speech translation (S2ST) technology is one approach that can overcome language barriers. However, S2ST systems require machine translation (MT), speech recognition (ASR), and synthesis (TTS) that rely heavily on supervised training and a broad set of language resources that can be difficult to collect from ethnic communities. Recently, a machine speech chain mechanism was proposed to enable ASR and TTS to assist each other in semi-supervised learning. The framework was initially implemented only for monolingual languages. In this study, we focus on developing speech recognition and synthesis for these Indonesian ethnic languages: Javanese, Sundanese, Balinese, and Bataks. We first separately train ASR and TTS of standard Indonesian in supervised training. We then develop ASR and TTS of ethnic languages by utilizing Indonesian ASR and TTS in a cross-lingual machine speech chain framework with only text or only speech data removing the need for paired speech-text data of those ethnic languages.

2019

pdf bib
Conversational Response Re-ranking Based on Event Causality and Role Factored Tensor Event Embedding
Shohei Tanaka | Koichiro Yoshino | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the First Workshop on NLP for Conversational AI

We propose a novel method for selecting coherent and diverse responses for a given dialogue context. The proposed method re-ranks response candidates generated from conversational models by using event causality relations between events in a dialogue history and response candidates (e.g., “be stressed out” precedes “relieve stress”). We use distributed event representation based on the Role Factored Tensor Model for a robust matching of event causality relations due to limited event causality knowledge of the system. Experimental results showed that the proposed method improved coherency and dialogue continuity of system responses.

pdf bib
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
Satoshi Nakamura | Milica Gasic | Ingrid Zuckerman | Gabriel Skantze | Mikio Nakano | Alexandros Papangelis | Stefan Ultes | Koichiro Yoshino
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

pdf bib
Neural Conversation Model Controllable by Given Dialogue Act Based on Adversarial Learning and Label-aware Objective
Seiya Kawano | Koichiro Yoshino | Satoshi Nakamura
Proceedings of the 12th International Conference on Natural Language Generation

Building a controllable neural conversation model (NCM) is an important task. In this paper, we focus on controlling the responses of NCMs by using dialogue act labels of responses as conditions. We introduce an adversarial learning framework for the task of generating conditional responses with a new objective to a discriminator, which explicitly distinguishes sentences by using labels. This change strongly encourages the generation of label-conditioned sentences. We compared the proposed method with some existing methods for generating conditional responses. The experimental results show that our proposed method has higher controllability for dialogue acts even though it has higher or comparable naturalness to existing methods.

2018

pdf bib
Guiding Neural Machine Translation with Retrieved Translation Pieces
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

One of the difficulties of neural machine translation (NMT) is the recall and appropriate translation of low-frequency words or phrases. In this paper, we propose a simple, fast, and effective method for recalling previously seen translation examples and incorporating them into the NMT decoding process. Specifically, for an input sentence, we use a search engine to retrieve sentence pairs whose source sides are similar with the input sentence, and then collect n-grams that are both in the retrieved target sentences and aligned with words that match in the source sentences, which we call “translation pieces”. We compute pseudo-probabilities for each retrieved sentence based on similarities between the input sentence and the retrieved source sentences, and use these to weight the retrieved translation pieces. Finally, an existing NMT model is used to translate the input sentence, with an additional bonus given to outputs that contain the collected translation pieces. We show our method improves NMT translation results up to 6 BLEU points on three narrow domain translation tasks where repetitiveness of the target sentences is particularly salient. It also causes little increase in the translation time, and compares favorably to another alternative retrieval-based method with respect to accuracy, speed, and simplicity of implementation.

pdf bib
Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing
Koichiro Yoshino | Yoko Ishikawa | Masahiro Mizukami | Yu Suzuki | Sakriani Sakti | Satoshi Nakamura
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Japanese Dialogue Corpus of Information Navigation and Attentive Listening Annotated with Extended ISO-24617-2 Dialogue Act Tags
Koichiro Yoshino | Hiroki Tanaka | Kyoshiro Sugiyama | Makoto Kondo | Satoshi Nakamura
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Construction of English-French Multimodal Affective Conversational Corpus from TV Dramas
Sashi Novitasari | Quoc Truong Do | Sakriani Sakti | Dessi Lestari | Satoshi Nakamura
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Multi-Source Neural Machine Translation with Missing Data
Yuta Nishimura | Katsuhito Sudoh | Graham Neubig | Satoshi Nakamura
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

Multi-source translation is an approach to exploit multiple inputs (e.g. in two different languages) to increase translation accuracy. In this paper, we examine approaches for multi-source neural machine translation (NMT) using an incomplete multilingual corpus in which some translations are missing. In practice, many multilingual corpora are not complete due to the difficulty to provide translations in all of the relevant languages (for example, in TED talks, most English talks only have subtitles for a small portion of the languages that TED supports). Existing studies on multi-source translation did not explicitly handle such situations. This study focuses on the use of incomplete multilingual corpora in multi-encoder NMT and mixture of NMT experts and examines a very simple implementation where missing source translations are replaced by a special symbol <NULL>. These methods allow us to use incomplete corpora both at training time and test time. In experiments with real incomplete multilingual corpora of TED Talks, the multi-source NMT with the <NULL> tokens achieved higher translation accuracies measured by BLEU than those by any one-to-one NMT systems.

pdf bib
Unsupervised Counselor Dialogue Clustering for Positive Emotion Elicitation in Neural Dialogue System
Nurul Lubis | Sakriani Sakti | Koichiro Yoshino | Satoshi Nakamura
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Positive emotion elicitation seeks to improve user’s emotional state through dialogue system interaction, where a chat-based scenario is layered with an implicit goal to address user’s emotional needs. Standard neural dialogue system approaches still fall short in this situation as they tend to generate only short, generic responses. Learning from expert actions is critical, as these potentially differ from standard dialogue acts. In this paper, we propose using a hierarchical neural network for response generation that is conditioned on 1) expert’s action, 2) dialogue context, and 3) user emotion, encoded from user input. We construct a corpus of interactions between a counselor and 30 participants following a negative emotional exposure to learn expert actions and responses in a positive emotion elicitation scenario. Instead of relying on the expensive, labor intensive, and often ambiguous human annotations, we unsupervisedly cluster the expert’s responses and use the resulting labels to train the network. Our experiments and evaluation show that the proposed approach yields lower perplexity and generates a larger variety of responses.

2017

pdf bib
An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation
Makoto Morishita | Yusuke Oda | Graham Neubig | Koichiro Yoshino | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the First Workshop on Neural Machine Translation

Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling.

pdf bib
Tree as a Pivot: Syntactic Matching Methods in Pivot Translation
Akiva Miura | Graham Neubig | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the Second Conference on Machine Translation

pdf bib
NICT-NAIST System for WMT17 Multimodal Translation Task
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the Second Conference on Machine Translation

pdf bib
Information Navigation System with Discovering User Interests
Koichiro Yoshino | Yu Suzuki | Satoshi Nakamura
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

We demonstrate an information navigation system for sightseeing domains that has a dialogue interface for discovering user interests for tourist activities. The system discovers interests of a user with focus detection on user utterances, and proactively presents related information to the discovered user interest. A partially observable Markov decision process (POMDP)-based dialogue manager, which is extended with user focus states, controls the behavior of the system to provide information with several dialogue acts for providing information. We transferred the belief-update function and the policy of the manager from other system trained on a different domain to show the generality of defined dialogue acts for our information navigation system.

pdf bib
A Simple and Strong Baseline: NAIST-NICT Neural Machine Translation System for WAT2017 English-Japanese Translation Task
Yusuke Oda | Katsuhito Sudoh | Satoshi Nakamura | Masao Utiyama | Eiichiro Sumita
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

This paper describes the details about the NAIST-NICT machine translation system for WAT2017 English-Japanese Scientific Paper Translation Task. The system consists of a language-independent tokenizer and an attentional encoder-decoder style neural machine translation model. According to the official results, our system achieves higher translation accuracy than any systems submitted previous campaigns despite simple model architecture.

pdf bib
Neural Machine Translation via Binary Code Prediction
Yusuke Oda | Philip Arthur | Graham Neubig | Koichiro Yoshino | Satoshi Nakamura
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we propose a new method for calculating the output layer in neural machine translation systems. The method is based on predicting a binary code for each word and can reduce computation time/memory requirements of the output layer to be logarithmic in vocabulary size in the best case. In addition, we also introduce two advanced approaches to improve the robustness of the proposed model: using error-correcting codes and combining softmax and binary codes. Experiments on two English-Japanese bidirectional translation tasks show proposed models achieve BLEU scores that approach the softmax, while reducing memory usage to the order of less than 1/10 and improving decoding speed on CPUs by x5 to x10.

pdf bib
Improving Neural Machine Translation through Phrase-based Forced Decoding
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Compared to traditional statistical machine translation (SMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using the phrase-based decoding cost to rerank the n-best NMT outputs. The main challenge in implementing this approach is that NMT outputs may not be in the search space of the standard phrase-based decoding algorithm, because the search space of phrase-based SMT is limited by the phrase-based translation rule table. We propose a soft forced decoding algorithm, which can always successfully find a decoding path for any NMT output. We show that using the forced decoding cost to rerank the NMT outputs can successfully improve translation quality on four different language pairs.

pdf bib
Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing
Andros Tjandra | Sakriani Sakti | Satoshi Nakamura
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target sequence. Most attentional mechanisms used today is based on a global attention property which requires a computation of a weighted summarization of the whole input sequence generated by encoder states. However, it is computationally expensive and often produces misalignment on the longer input sequence. Furthermore, it does not fit with monotonous or left-to-right nature in several tasks, such as automatic speech recognition (ASR), grapheme-to-phoneme (G2P), etc. In this paper, we propose a novel attention mechanism that has local and monotonic properties. Various ways to control those properties are also explored. Experimental results on ASR, G2P and machine translation between two languages with similar sentence structures, demonstrate that the proposed encoder-decoder model with local monotonic attention could achieve significant performance improvements and reduce the computational complexity in comparison with the one that used the standard global attention architecture.

pdf bib
Acquisition and Assessment of Semantic Content for the Generation of Elaborateness and Indirectness in Spoken Dialogue Systems
Louisa Pragst | Koichiro Yoshino | Wolfgang Minker | Satoshi Nakamura | Stefan Ultes
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In a dialogue system, the dialogue manager selects one of several system actions and thereby determines the system’s behaviour. Defining all possible system actions in a dialogue system by hand is a tedious work. While efforts have been made to automatically generate such system actions, those approaches are mostly focused on providing functional system behaviour. Adapting the system behaviour to the user becomes a difficult task due to the limited amount of system actions available. We aim to increase the adaptability of a dialogue system by automatically generating variants of system actions. In this work, we introduce an approach to automatically generate action variants for elaborateness and indirectness. Our proposed algorithm extracts RDF triplets from a knowledge base and rates their relevance to the original system action to find suitable content. We show that the results of our algorithm are mostly perceived similarly to human generated elaborateness and indirectness and can be used to adapt a conversation to the current user and situation. We also discuss where the results of our algorithm are still lacking and how this could be improved: Taking into account the conversation topic as well as the culture of the user is likely to have beneficial effect on the user’s perception.

2016

pdf bib
Incorporating Discrete Translation Lexicons into Neural Machine Translation
Philip Arthur | Graham Neubig | Satoshi Nakamura
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning a Lexicon and Translation Model from Phoneme Lattices
Oliver Adams | Graham Neubig | Trevor Cohn | Steven Bird | Quoc Truong Do | Satoshi Nakamura
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Continuous Space Rule Selection Model for Syntax-based Statistical Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces
Matthias Sperber | Graham Neubig | Satoshi Nakamura | Alex Waibel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Computer-assisted transcription promises high-quality speech transcription at reduced costs. This is achieved by limiting human effort to transcribing parts for which automatic transcription quality is insufficient. Our goal is to improve the human transcription quality via appropriate user interface design. We focus on iterative interfaces that allow humans to solve tasks based on an initially given suggestion, in this case an automatic transcription. We conduct a user study that reveals considerable quality gains for three variations of iterative interfaces over a non-iterative from-scratch transcription interface. Our iterative interfaces included post-editing, confidence-enhanced post-editing, and a novel retyping interface. All three yielded similar quality on average, but we found that the proposed retyping interface was less sensitive to the difficulty of the segment, and superior when the automatic transcription of the segment contained relatively many errors. An analysis using mixed-effects models allows us to quantify these and other factors and draw conclusions over which interface design should be chosen in which circumstance.

pdf bib
Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
Nurul Lubis | Randy Gomez | Sakriani Sakti | Keisuke Nakamura | Koichiro Yoshino | Satoshi Nakamura | Kazuhiro Nakadai
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Emotional aspects play a vital role in making human communication a rich and dynamic experience. As we introduce more automated system in our daily lives, it becomes increasingly important to incorporate emotion to provide as natural an interaction as possible. To achieve said incorporation, rich sets of labeled emotional data is prerequisite. However, in Japanese, existing emotion database is still limited to unimodal and bimodal corpora. Since emotion is not only expressed through speech, but also visually at the same time, it is essential to include multiple modalities in an observation. In this paper, we present the first audio-visual emotion corpora in Japanese, collected from 14 native speakers. The corpus contains 100 minutes of annotated and transcribed material. We performed preliminary emotion recognition experiments on the corpus and achieved an accuracy of 61.42% for five classes of emotion.

pdf bib
Cultural Communication Idiosyncrasies in Human-Computer Interaction
Juliana Miehle | Koichiro Yoshino | Louisa Pragst | Stefan Ultes | Satoshi Nakamura | Wolfgang Minker
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Analyzing the Effect of Entrainment on Dialogue Acts
Masahiro Mizukami | Koichiro Yoshino | Graham Neubig | David Traum | Satoshi Nakamura
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation
Akiva Miura | Graham Neubig | Michael Paul | Satoshi Nakamura
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
A Binarized Neural Network Joint Model for Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichiro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering
Kyoshiro Sugiyama | Masahiro Mizukami | Graham Neubig | Koichiro Yoshino | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Reinforcement Learning in Multi-Party Trading Dialog
Takuya Hiraoka | Kallirroi Georgila | Elnaz Nouri | David Traum | Satoshi Nakamura
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015
Graham Neubig | Makoto Morishita | Satoshi Nakamura
Proceedings of the 2nd Workshop on Asian Translation (WAT2015)

pdf bib
The NAIST English speech recognition system for IWSLT 2015
Michael Heck | Quoc Truong Do | Sakriani Sakti | Graham Neubig | Satoshi Nakamura
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Improving translation of emphasis with pause prediction in speech-to-speech translation systems
Quoc Truong Do | Sakriani Sakti | Graham Neubig | Tomoki Toda | Satoshi Nakamura
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers

pdf bib
Parser self-training for syntax-based machine translation
Makoto Morishita | Koichi Akabe | Yuto Hatakoshi | Graham Neubig | Koichiro Yoshino | Satoshi Nakamura
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers

pdf bib
Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents
Yusuke Oda | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Improving Pivot Translation by Remembering the Pivot
Akiva Miura | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Ckylark: A More Robust PCFG-LA Parser
Yusuke Oda | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Semantic Parsing of Ambiguous Input through Paraphrasing and Verification
Philip Arthur | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Transactions of the Association for Computational Linguistics, Volume 3

We propose a new method for semantic parsing of ambiguous and ungrammatical input, such as search queries. We do so by building on an existing semantic parsing framework that uses synchronous context free grammars (SCFG) to jointly model the input sentence and output meaning representation. We generalize this SCFG framework to allow not one, but multiple outputs. Using this formalism, we construct a grammar that takes an ambiguous input string and jointly maps it into both a meaning representation and a natural language paraphrase that is less ambiguous than the original input. This paraphrase can be used to disambiguate the meaning representation via verification using a language model that calculates the probability of each paraphrase.

2014

pdf bib
Acquiring a Dictionary of Emotion-Provoking Events
Hoa Trong Vu | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Segmentation for Efficient Supervised Language Annotation with an Explicit Cost-Utility Tradeoff
Matthias Sperber | Mirjam Simantzik | Graham Neubig | Satoshi Nakamura | Alex Waibel
Transactions of the Association for Computational Linguistics, Volume 2

In this paper, we study the problem of manually correcting automatic annotations of natural language in as efficient a manner as possible. We introduce a method for automatically segmenting a corpus into chunks such that many uncertain labels are grouped into the same chunk, while human supervision can be omitted altogether for other segments. A tradeoff must be found for segment sizes. Choosing short segments allows us to reduce the number of highly confident labels that are supervised by the annotator, which is useful because these labels are often already correct and supervising correct labels is a waste of effort. In contrast, long segments reduce the cognitive effort due to context switches. Our method helps find the segmentation that optimizes supervision efficiency by defining user models to predict the cost and utility of supervising each segment and solving a constrained optimization problem balancing these contradictory objectives. A user study demonstrates noticeable gains over pre-segmented, confidence-ordered baselines on two natural language processing tasks: speech transcription and word segmentation.

pdf bib
Linguistic and Acoustic Features for Automatic Identification of Autism Spectrum Disorders in Children’s Narrative
Hiroki Tanaka | Sakriani Sakti | Graham Neubig | Tomoki Toda | Satoshi Nakamura
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

pdf bib
Rule-based Syntactic Preprocessing for Syntax-based Machine Translation
Yuto Hatakoshi | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Discriminative Language Models as a Tool for Machine Translation Error Analysis
Koichi Akabe | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Reinforcement Learning of Cooperative Persuasive Dialogue Policies using Framing
Takuya Hiraoka | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Collection of a Simultaneous Translation Corpus for Comparative Analysis
Hiroaki Shimizu | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes the collection of an English-Japanese/Japanese-English simultaneous interpretation corpus. There are two main features of the corpus. The first is that professional simultaneous interpreters with different amounts of experience cooperated with the collection. By comparing data from simultaneous interpretation of each interpreter, it is possible to compare better interpretations to those that are not as good. The second is that for part of our corpus there are already translation data available. This makes it possible to compare translation data with simultaneous interpretation data. We recorded the interpretations of lectures and news, and created time-aligned transcriptions. A total of 387k words of transcribed data were collected. The corpus will be helpful to analyze differences in interpretations styles and to construct simultaneous interpretation systems.

pdf bib
Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and A Network-based ASR System
Sakriani Sakti | Keigo Kubo | Sho Matsumiya | Graham Neubig | Tomoki Toda | Satoshi Nakamura | Fumihiro Adachi | Ryosuke Isotani
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper outlines the recent development on multilingual medical data and multilingual speech recognition system for network-based speech-to-speech translation in the medical domain. The overall speech-to-speech translation (S2ST) system was designed to translate spoken utterances from a given source language into a target language in order to facilitate multilingual conversations and reduce the problems caused by language barriers in medical situations. Our final system utilizes a weighted finite-state transducers with n-gram language models. Currently, the system successfully covers three languages: Japanese, English, and Chinese. The difficulties involved in connecting Japanese, English and Chinese speech recognition systems through Web servers will be discussed, and the experimental results in simulated medical conversation will also be presented.

pdf bib
Optimizing Segmentation Strategies for Simultaneous Speech Translation
Yusuke Oda | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Towards High-Reliability Speech Translation in the Medical Domain
Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura | Yuji Matsumoto | Ryosuke Isotani | Yukichi Ikeda
The First Workshop on Natural Language Processing for Medical and Healthcare Fields

pdf bib
The NAIST English speech recognition system for IWSLT 2013
Sakriani Sakti | Keigo Kubo | Graham Neubig | Tomoki Toda | Satoshi Nakamura
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the NAIST English speech recognition system for the IWSLT 2013 Evaluation Campaign. In particular, we participated in the ASR track of the IWSLT TED task. Last year, we participated in collaboration with Karlsruhe Institute of Technology (KIT). This year is our first time to build a full-fledged ASR system for IWSLT solely developed by NAIST. Our final system utilizes weighted finitestate transducers with four-gram language models. The hypothesis selection is based on the principle of system combination. On the IWSLT official test set our system introduced in this work achieves a WER of 9.1% for tst2011, 10.0% for tst2012, and 16.2% for the new tst2013.

pdf bib
Constructing a speech translation system using simultaneous interpretation data
Hiroaki Shimizu | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

There has been a fair amount of work on automatic speech translation systems that translate in real-time, serving as a computerized version of a simultaneous interpreter. It has been noticed in the field of translation studies that simultaneous interpreters perform a number of tricks to make the content easier to understand in real-time, including dividing their translations into small chunks, or summarizing less important content. However, the majority of previous work has not specifically considered this fact, simply using translation data (made by translators) for learning of the machine translation system. In this paper, we examine the possibilities of additionally incorporating simultaneous interpretation data (made by simultaneous interpreters) in the learning process. First we collect simultaneous interpretation data from professional simultaneous interpreters of three levels, and perform an analysis of the data. Next, we incorporate the simultaneous interpretation data in the learning of the machine translation system. As a result, the translation style of the system becomes more similar to that of a highly experienced simultaneous interpreter. We also find that according to automatic evaluation metrics, our system achieves performance similar to that of a simultaneous interpreter that has 1 year of experience.

pdf bib
Incremental unsupervised training for university lecture recognition
Michael Heck | Sebastian Stüker | Sakriani Sakti | Alex Waibel | Satoshi Nakamura
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

In this paper we describe our work on unsupervised adaptation of the acoustic model of our simultaneous lecture translation system. We trained a speaker independent acoustic model, with which we produce automatic transcriptions of new lectures in order to improve the system for a specific lecturer. We compare our results against a model that was trained in a supervised way on an exact manual transcription. We examine four different ways of processing the decoder outputs of the automatic transcription with respect to the treatment of pronunciation variants and noise words. We will show that, instead of fixating the latter informations in the transcriptions, it is of advantage to let the Viterbi algorithm during training decide which pronunciations to use and where to insert which noise words. Further, we utilize word level posterior probabilities obtained during decoding by weighting and thresholding the words of a transcription.

2012

pdf bib
The NAIST machine translation system for IWSLT2012
Graham Neubig | Kevin Duh | Masaya Ogushi | Takamoto Kano | Tetsuo Kiso | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the NAIST statistical machine translation system for the IWSLT2012 Evaluation Campaign. We participated in all TED Talk tasks, for a total of 11 language-pairs. For all tasks, we use the Moses phrase-based decoder and its experiment management system as a common base for building translation systems. The focus of our work is on performing a comprehensive comparison of a multitude of existing techniques for the TED task, exploring issues such as out-of-domain data filtering, minimum Bayes risk decoding, MERT vs. PRO tuning, word alignment combination, and morphology.

pdf bib
The 2012 KIT and KIT-NAIST English ASR systems for the IWSLT evaluation
Christian Saam | Christian Mohr | Kevin Kilgour | Michael Heck | Matthias Sperber | Keigo Kubo | Sebatian Stüker | Sakriani Sakri | Graham Neubig | Tomoki Toda | Satoshi Nakamura | Alex Waibel
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes our English Speech-to-Text (STT) systems for the 2012 IWSLT TED ASR track evaluation. The systems consist of 10 subsystems that are combinations of different front-ends, e.g. MVDR based and MFCC based ones, and two different phone sets. The outputs of the subsystems are combined via confusion network combination. Decoding is done in two stages, where the systems of the second stage are adapted in an unsupervised manner on the combination of the first stage outputs using VTLN, MLLR, and cM-LLR.

pdf bib
The KIT-NAIST (contrastive) English ASR system for IWSLT 2012
Michael Heck | Keigo Kubo | Matthias Sperber | Sakriani Sakti | Sebastian Stüker | Christian Saam | Kevin Kilgour | Christian Mohr | Graham Neubig | Tomoki Toda | Satoshi Nakamura | Alex Waibel
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the KIT-NAIST (Contrastive) English speech recognition system for the IWSLT 2012 Evaluation Campaign. In particular, we participated in the ASR track of the IWSLT TED task. The system was developed by Karlsruhe Institute of Technology (KIT) and Nara Institute of Science and Technology (NAIST) teams in collaboration within the interACT project. We employ single system decoding with fully continuous and semi-continuous models, as well as a three-stage, multipass system combination framework built with the Janus Recognition Toolkit. On the IWSLT 2010 test set our single system introduced in this work achieves a WER of 17.6%, and our final combination achieves a WER of 14.4%.

pdf bib
Minimum Bayes-risk decoding extended with similar examples: NAIST-NCT at IWSLT 2012
Hiroaki Shimizu | Masao Utiyama | Eiichiro Sumita | Satoshi Nakamura
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes our methods used in the NAIST-NICT submission to the International Workshop on Spoken Language Translation (IWSLT) 2012 evaluation campaign. In particular, we propose two extensions to minimum bayes-risk decoding which reduces a expected loss.

pdf bib
A method for translation of paralinguistic information
Takatomo Kano | Sakriani Sakti | Shinnosuke Takamichi | Graham Neubig | Tomoki Toda | Satoshi Nakamura
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

This paper is concerned with speech-to-speech translation that is sensitive to paralinguistic information. From the many different possible paralinguistic features to handle, in this paper we chose duration and power as a first step, proposing a method that can translate these features from input speech to the output speech in continuous space. This is done in a simple and language-independent fashion by training a regression model that maps source language duration and power information into the target language. We evaluate the proposed method on a digit translation task and show that paralinguistic information in input speech appears in output speech, and that this information can be used by target language speakers to detect emphasis.

2011

pdf bib
Toward Construction of Spoken Dialogue System that Evokes Users’ Spontaneous Backchannels
Teruhisa Misu | Etsuo Mizukami | Yoshinori Shiga | Shinichi Kawamoto | Hisashi Kawai | Satoshi Nakamura
Proceedings of the SIGDIAL 2011 Conference

2010

pdf bib
Modeling Spoken Decision Making Dialogue and Optimization of its Dialogue Strategy
Teruhisa Misu | Komei Sugiura | Kiyonori Ohtake | Chiori Hori | Hideki Kashioka | Hisashi Kawai | Satoshi Nakamura
Proceedings of the SIGDIAL 2010 Conference

pdf bib
Dialogue Acts Annotation for NICT Kyoto Tour Dialogue Corpus to Construct Statistical Dialogue Systems
Kiyonori Ohtake | Teruhisa Misu | Chiori Hori | Hideki Kashioka | Satoshi Nakamura
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper introduces a new corpus of consulting dialogues designed for training a dialogue manager that can handle consulting dialogues through spontaneous interactions from the tagged dialogue corpus. We have collected more than 150 hours of consulting dialogues in the tourist guidance domain. We are developing the corpus that consists of speech, transcripts, speech act (SA) tags, morphological analysis results, dependency analysis results, and semantic content tags. This paper outlines our taxonomy of dialogue act (DA) annotation that can describe two aspects of an utterance: the communicative function (SA), and the semantic content of the utterance. We provide an overview of the Kyoto tour dialogue corpus and a preliminary analysis using the DA tags. We also show a result of a preliminary experiment for SA tagging via Support Vector Machines (SVMs). We introduce the current states of the corpus development In addition, we mention the usage of our corpus for the spoken dialogue system that is being developed.

2009

pdf bib
Network-based speech-to-speech translation
Chiori Hori | Sakriani Sakti | Michael Paul | Noriyuki Kimura | Yutaka Ashikari | Ryosuke Isotani | Eiichiro Sumita | Satoshi Nakamura
Proceedings of the 6th International Workshop on Spoken Language Translation: Papers

This demo shows the network-based speech-to-speech translation system. The system was designed to perform realtime, location-free, multi-party translation between speakers of different languages. The spoken language modules: automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS), are connected through Web servers that can be accessed via client applications worldwide. In this demo, we will show the multiparty speech-to-speech translation of Japanese, Chinese, Indonesian, Vietnamese, and English, provided by the NICT server. These speech-to-speech modules have been developed by NICT as a part of A-STAR (Asian Speech Translation Advanced Research) consortium project1.

pdf bib
On the Importance of Pivot Language Selection for Statistical Machine Translation
Michael Paul | Hirofumi Yamamoto | Eiichiro Sumita | Satoshi Nakamura
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Annotating Dialogue Acts to Construct Dialogue Systems for Consulting
Kiyonori Ohtake | Teruhisa Misu | Chiori Hori | Hideki Kashioka | Satoshi Nakamura
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

pdf bib
Construction of Chinese Segmented and POS-tagged Conversational Corpora and Their Evaluations on Spontaneous Speech Recognitions
Xinhui Hu | Ryosuke Isotani | Satoshi Nakamura
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

2008

pdf bib
Multilingual Mobile-Phone Translation Services for World Travelers
Michael Paul | Hideo Okuma | Hirofumi Yamamoto | Eiichiro Sumita | Shigeki Matsuda | Tohru Shimizu | Satoshi Nakamura
Coling 2008: Companion volume: Demonstrations

pdf bib
Development of Indonesian Large Vocabulary Continuous Speech Recognition System within A-STAR Project
Sakriani Sakti | Eka Kelana | Hammam Riza | Shinsuke Sakai | Konstantin Markov | Satoshi Nakamura
Proceedings of the Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST)

pdf bib
Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments: newest Part of the CENSREC Series -
Takanobu Nishiura | Masato Nakayama | Yuki Denda | Norihide Kitaoka | Kazumasa Yamamoto | Takeshi Yamada | Satoru Tsuge | Chiyomi Miyajima | Masakiyo Fujimoto | Tetsuya Takiguchi | Satoshi Tamura | Shingo Kuroiwa | Kazuya Takeda | Satoshi Nakamura
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Recently, speech recognition performance has been drastically improved by statistical methods and huge speech databases. Now performance improvement under such realistic environments as noisy conditions is being focused on. Since October 2001, we from the working group of the Information Processing Society in Japan have been working on evaluation methodologies and frameworks for Japanese noisy speech recognition. We have released frameworks including databases and evaluation tools called CENSREC-1 (Corpus and Environment for Noisy Speech RECognition 1; formerly AURORA-2J), CENSREC-2 (in-car connected digits recognition), CENSREC-3 (in-car isolated word recognition), and CENSREC-1-C (voice activity detection under noisy conditions). In this paper, we newly introduce a collection of databases and evaluation tools named CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a hands-free speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition. The results of evaluation experiments proved that CENSREC-4 is an effective database suitable for evaluating the new dereverberation method because the traditional dereverberation process had difficulty sufficiently improving the recognition performance. The framework was released in March 2008, and many studies are being conducted with it in Japan.

2007

pdf bib
NICT-ATR Speech-to-Speech Translation System
Eiichiro Sumita | Tohru Shimizu | Satoshi Nakamura
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf bib
Oriental COCOSDA: Past, Present and Future
Shuichi Itahashi | Chiu-yu Tseng | Satoshi Nakamura
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The purpose of Oriental COCOSDA is to exchange ideas, to share information and to discuss regional matters on creation, utilization, dissemination of spoken language corpora of oriental languages and also on the assessment methods of speech recognition/synthesis systems as well as to promote speech research on oriental languages. A series of International Workshop on East Asian Language Resources and Evaluation (EALREW) or Oriental COCOSDA Workshop has been held annually since the preparatory meeting held in 1997. After that, we have had a series of workshops every year in Japan, Taiwan, China, Korea, Thailand, Singapore, India and Indonesia. The Oriental COCOSDA is managed by a convener, three advisory members, and 21 representatives from ten regions in Oriental countries. We need much more Pan-Asia collaboration with research organizations and consortia, though there are some domestic activities in Oriental countries. We note that speech research has become popular gradually in Oriental countries including Malaysia, Vietnam, Xinjang Uygur Autonomous Region of China, etc. We plan to hold future Oriental COCOSDA meetings in these places in order to promote speech research there.

pdf bib
Development of client-server speech translation system on a multi-lingual speech communication platform
Tohru Shimizu | Yutaka Ashikari | Eiichiro Sumita | Hideki Kashioka | Satoshi Nakamura
Proceedings of the Third International Workshop on Spoken Language Translation: Papers

2004

pdf bib
Multi-lingual speech recognition system for speech-to-speech translation
Satoshi Nakamura | Konstantin Markov | Takatoshi Jitsuhiro | Jin-Song Zhang | Hirofumi Yamamoto | Genichiro Kikui
Proceedings of the First International Workshop on Spoken Language Translation: Papers

2002

pdf bib
The Present Status of Speech Database in Japan: Development, Management, and Application to Speech Research
Hisao Kuwabara | Shuich Itahashi | Mikio Yamamoto | Toshiyuki Takezawa | Satoshi Nakamura | Kazuya Takeda
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition
Satoshi Nakamura | Kazuo Hiyane | Futoshi Asano | Takanobu Nishiura | Takeshi Yamada
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

Search
Co-authors