Ryo Masumura


2020

pdf bib
Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model
Mana Ihori | Ryo Masumura | Naoki Makishima | Tomohiro Tanaka | Akihiko Takashima | Shota Orihashi
Proceedings of the 13th International Conference on Natural Language Generation

This paper presents a novel fusion method for integrating an external language model (LM) into the Transformer based sequence-to-sequence (seq2seq) model. While paired data are basically required to train the seq2seq model, the external LM can be trained with only unpaired data. Thus, it is important to leverage memorized knowledge in the external LM for building the seq2seq model, since it is hard to prepare a large amount of paired data. However, the existing fusion methods assume that the LM is integrated with recurrent neural network-based seq2seq models instead of the Transformer. Therefore, this paper proposes a fusion method that can explicitly utilize network structures in the Transformer. The proposed method, called memory attentive fusion, leverages the Transformer-style attention mechanism that repeats source-target attention in a multi-hop manner for reading the memorized knowledge in the LM. Our experiments on two text-style conversion tasks demonstrate that the proposed method performs better than conventional fusion methods.

pdf bib
Generating Responses that Reflect Meta Information in User-Generated Question Answer Pairs
Takashi Kodama | Ryuichiro Higashinaka | Koh Mitsuda | Ryo Masumura | Yushi Aono | Ryuta Nakamura | Noritake Adachi | Hidetoshi Kawabata
Proceedings of the 12th Language Resources and Evaluation Conference

This paper concerns the problem of realizing consistent personalities in neural conversational modeling by using user generated question-answer pairs as training data. Using the framework of role play-based question answering, we collected single-turn question-answer pairs for particular characters from online users. Meta information was also collected such as emotion and intimacy related to question-answer pairs. We verified the quality of the collected data and, by subjective evaluation, we also verified their usefulness in training neural conversational models for generating utterances reflecting the meta information, especially emotion.

pdf bib
Parallel Corpus for Japanese Spoken-to-Written Style Conversion
Mana Ihori | Akihiko Takashima | Ryo Masumura
Proceedings of the 12th Language Resources and Evaluation Conference

With the increase of automatic speech recognition (ASR) applications, spoken-to-written style conversion that transforms spoken-style text into written-style text is becoming an important technology to increase the readability of ASR transcriptions. To establish such conversion technology, a parallel corpus of spoken-style text and written-style text is beneficial because it can be utilized for building end-to-end neural sequence transformation models. Spoken-to-written style conversion involves multiple conversion problems including punctuation restoration, disfluency detection, and simplification. However, most existing corpora tend to be made for just one of these conversion problems. In addition, in Japanese, we have to consider not only general spoken-to-written style conversion problems but also Japanese-specific ones, such as language style unification (e.g., polite, frank, and direct styles) and omitted postpositional particle expressions restoration. Therefore, we created a new Japanese parallel corpus of spoken-style text and written-style text that can simultaneously handle general problems and Japanese-specific ones. To make this corpus, we prepared four types of spoken-style text and utilized a crowdsourcing service for manually converting them into written-style text. This paper describes the building setup of this corpus and reports the baseline results of spoken-to-written style conversion using the latest neural sequence transformation models.

pdf bib
DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus
Yuki Yamashita | Tomoki Koriyama | Yuki Saito | Shinnosuke Takamichi | Yusuke Ijima | Ryo Masumura | Hiroshi Saruwatari
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we investigate the effectiveness of using rich annotations in deep neural network (DNN)-based statistical speech synthesis. DNN-based frameworks typically use linguistic information as input features called context instead of directly using text. In such frameworks, we can synthesize not only reading-style speech but also speech with paralinguistic and nonlinguistic features by adding such information to the context. However, it is not clear what kind of information is crucial for reproducing paralinguistic and nonlinguistic features. Therefore, we investigate the effectiveness of rich tags in DNN-based speech synthesis according to the Corpus of Spontaneous Japanese (CSJ), which has a large amount of annotations on paralinguistic features such as prosody, disfluency, and morphological features. Experimental evaluation results shows that the reproducibility of paralinguistic features of synthetic speech was enhanced by adding such information as context.

2018

pdf bib
Neural Dialogue Context Online End-of-Turn Detection
Ryo Masumura | Tomohiro Tanaka | Atsushi Ando | Ryo Ishii | Ryuichiro Higashinaka | Yushi Aono
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker’s utterances and collocutor’s utterances. The proposed method combines multiple time-asynchronous long short-term memory recurrent neural networks, which can capture speaker’s and collocutor’s multiple sequential features, and their interactions. On the assumption of applying the proposed method to spoken dialogue systems, we introduce speaker’s acoustic sequential features and collocutor’s linguistic sequential features, each of which can be extracted in an online manner. Our evaluation confirms the effectiveness of taking dialogue context formed by the speaker’s utterances and collocutor’s utterances into consideration.

pdf bib
Multi-task and Multi-lingual Joint Learning of Neural Lexical Utterance Classification based on Partially-shared Modeling
Ryo Masumura | Tomohiro Tanaka | Ryuichiro Higashinaka | Hirokazu Masataki | Yushi Aono
Proceedings of the 27th International Conference on Computational Linguistics

This paper is an initial study on multi-task and multi-lingual joint learning for lexical utterance classification. A major problem in constructing lexical utterance classification modules for spoken dialogue systems is that individual data resources are often limited or unbalanced among tasks and/or languages. Various studies have examined joint learning using neural-network based shared modeling; however, previous joint learning studies focused on either cross-task or cross-lingual knowledge transfer. In order to simultaneously support both multi-task and multi-lingual joint learning, our idea is to explicitly divide state-of-the-art neural lexical utterance classification into language-specific components that can be shared between different tasks and task-specific components that can be shared between different languages. In addition, in order to effectively transfer knowledge between different task data sets and different language data sets, this paper proposes a partially-shared modeling method that possesses both shared components and components specific to individual data sets. We demonstrate the effectiveness of proposed method using Japanese and English data sets with three different lexical utterance classification tasks.

pdf bib
Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification
Ryo Masumura | Yusuke Shinohara | Ryuichiro Higashinaka | Yushi Aono
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper proposes an adversarial training method for the multi-task and multi-lingual joint modeling needed for utterance intent classification. In joint modeling, common knowledge can be efficiently utilized among multiple tasks or multiple languages. This is achieved by introducing both language-specific networks shared among different tasks and task-specific networks shared among different languages. However, the shared networks are often specialized in majority tasks or languages, so performance degradation must be expected for some minor data sets. In order to improve the invariance of shared networks, the proposed method introduces both language-specific task adversarial networks and task-specific language adversarial networks; both are leveraged for purging the task or language dependencies of the shared networks. The effectiveness of the adversarial training proposal is demonstrated using Japanese and English data sets for three different utterance intent classification tasks.

2017

pdf bib
Hyperspherical Query Likelihood Models with Word Embeddings
Ryo Masumura | Taichi Asami | Hirokazu Masataki | Kugatsu Sadamitsu | Kyosuke Nishida | Ryuichiro Higashinaka
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper presents an initial study on hyperspherical query likelihood models (QLMs) for information retrieval (IR). Our motivation is to naturally utilize pre-trained word embeddings for probabilistic IR. To this end, key idea is to directly leverage the word embeddings as random variables for directional probabilistic models based on von Mises-Fisher distributions which are familiar to cosine distances. The proposed method enables us to theoretically take semantic similarities between document and target queries into consideration without introducing heuristic expansion techniques. In addition, this paper reveals relationships between hyperspherical QLMs and conventional QLMs. Experiments show document retrieval evaluation results in which a hyperspherical QLM is compared to conventional QLMs and document distance metrics using word or document embeddings.

pdf bib
Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels
Itsumi Saito | Jun Suzuki | Kyosuke Nishida | Kugatsu Sadamitsu | Satoshi Kobashikawa | Ryo Masumura | Yuji Matsumoto | Junji Tomita
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

In this study, we investigated the effectiveness of augmented data for encoder-decoder-based neural normalization models. Attention based encoder-decoder models are greatly effective in generating many natural languages. % such as machine translation or machine summarization. In general, we have to prepare for a large amount of training data to train an encoder-decoder model. Unlike machine translation, there are few training data for text-normalization tasks. In this paper, we propose two methods for generating augmented data. The experimental results with Japanese dialect normalization indicate that our methods are effective for an encoder-decoder model and achieve higher BLEU score than that of baselines. We also investigated the oracle performance and revealed that there is sufficient room for improving an encoder-decoder model.

2015

pdf bib
Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks
Ryo Masumura | Taichi Asami | Takanobu Oba | Hirokazu Masataki | Sumitaka Sakauchi | Akinori Ito
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Discourse Relation Recognition by Comparing Various Units of Sentence Expression with Recursive Neural Network
Atsushi Otsuka | Toru Hirano | Chiaki Miyazaki | Ryo Masumura | Ryuichiro Higashinaka | Toshiro Makino | Yoshihiro Matsuo
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation