Ryo Masumura

2023

pdf bib abs
Retrieval, Masking, and Generation: Feedback Comment Generation using Masked Comment Examples
Mana Ihori | Hiroshi Sato | Tomohiro Tanaka | Ryo Masumura
Proceedings of the 16th International Natural Language Generation Conference: Generation Challenges

In this paper, we propose a novel method, retrieval, masking, and generation, for feedback comment generation. Feedback comment generation is a task in which a system generates feedback comments such as hints or explanatory notes for language learners, given input text and position showing where to comment. In the conventional study, the retrieve-and-edit method for retrieving feedback comments in the data pool and editing the comments has been thought effective for this task. However, the performance of this method does not perform as well as other conventional methods because its model learns to edit tokens that do not need to be rewritten in the retrieved comments. To mitigate this problem, we propose a method for combining retrieval, masking, and generation based on the retrieve-and-edit method. Specifically, tokens of feedback comments retrieved from the data pool are masked, and this masked feedback comment is used as a template to generate feedback comments. The proposed method should prevent unnecessary conversion by using not retrieved feedback comments directly but masking them. Our experiments on feedback comment generation demonstrate that the proposed method outperforms conventional methods.

2022

pdf bib abs
Multi-Perspective Document Revision
Mana Ihori | Hiroshi Sato | Tomohiro Tanaka | Ryo Masumura
Proceedings of the 29th International Conference on Computational Linguistics

This paper presents a novel multi-perspective document revision task. In conventional studies on document revision, tasks such as grammatical error correction, sentence reordering, and discourse relation classification have been performed individually; however, these tasks simultaneously should be revised to improve the readability and clarity of a whole document. Thus, our study defines multi-perspective document revision as a task that simultaneously revises multiple perspectives. To model the task, we design a novel Japanese multi-perspective document revision dataset that simultaneously handles seven perspectives to improve the readability and clarity of a document. Although a large amount of data that simultaneously handles multiple perspectives is needed to model multi-perspective document revision elaborately, it is difficult to prepare such a large amount of this data. Therefore, our study offers a multi-perspective document revision modeling method that can use a limited amount of matched data (i.e., data for the multi-perspective document revision task) and external partially-matched data (e.g., data for the grammatical error correction task). Experiments using our created dataset demonstrate the effectiveness of using multiple partially-matched datasets to model the multi-perspective document revision task.

pdf bib abs
Multimodal Negotiation Corpus with Various Subjective Assessments for Social-Psychological Outcome Prediction from Non-Verbal Cues
Nobukatsu Hojo | Satoshi Kobashikawa | Saki Mizuno | Ryo Masumura
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This study investigates social-psychological negotiation-outcome prediction (SPNOP), a novel task for estimating various subjective evaluation scores of negotiation, such as satisfaction and trust, from negotiation dialogue data. To investigate SPNOP, a corpus with various psychological measurements is beneficial because the interaction process of negotiation relates to many aspects of psychology. However, current negotiation corpora only include information related to objective outcomes or a single aspect of psychology. In addition, most use the “laboratory setting” that uses non-skilled negotiators and over simplified negotiation scenarios. There is a concern that such a gap with actual negotiation will intrinsically affect the behavior and psychology of negotiators in the corpus, which can degrade the performance of models trained from the corpus in real situations. Therefore, we created a negotiation corpus with three features; 1) was assessed with various psychological measurements, 2) used skilled negotiators, and 3) used scenarios of context-rich negotiation. We recorded video and audio of negotiations in Japanese to investigate SPNOP in the context of social signal processing. Experimental results indicate that social-psychological outcomes can be effectively estimated from multimodal information.

2020

pdf bib abs
Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model
Mana Ihori | Ryo Masumura | Naoki Makishima | Tomohiro Tanaka | Akihiko Takashima | Shota Orihashi
Proceedings of the 13th International Conference on Natural Language Generation

This paper presents a novel fusion method for integrating an external language model (LM) into the Transformer based sequence-to-sequence (seq2seq) model. While paired data are basically required to train the seq2seq model, the external LM can be trained with only unpaired data. Thus, it is important to leverage memorized knowledge in the external LM for building the seq2seq model, since it is hard to prepare a large amount of paired data. However, the existing fusion methods assume that the LM is integrated with recurrent neural network-based seq2seq models instead of the Transformer. Therefore, this paper proposes a fusion method that can explicitly utilize network structures in the Transformer. The proposed method, called memory attentive fusion, leverages the Transformer-style attention mechanism that repeats source-target attention in a multi-hop manner for reading the memorized knowledge in the LM. Our experiments on two text-style conversion tasks demonstrate that the proposed method performs better than conventional fusion methods.

This paper concerns the problem of realizing consistent personalities in neural conversational modeling by using user generated question-answer pairs as training data. Using the framework of role play-based question answering, we collected single-turn question-answer pairs for particular characters from online users. Meta information was also collected such as emotion and intimacy related to question-answer pairs. We verified the quality of the collected data and, by subjective evaluation, we also verified their usefulness in training neural conversational models for generating utterances reflecting the meta information, especially emotion.

pdf bib abs
Parallel Corpus for Japanese Spoken-to-Written Style Conversion
Mana Ihori | Akihiko Takashima | Ryo Masumura
Proceedings of the Twelfth Language Resources and Evaluation Conference

With the increase of automatic speech recognition (ASR) applications, spoken-to-written style conversion that transforms spoken-style text into written-style text is becoming an important technology to increase the readability of ASR transcriptions. To establish such conversion technology, a parallel corpus of spoken-style text and written-style text is beneficial because it can be utilized for building end-to-end neural sequence transformation models. Spoken-to-written style conversion involves multiple conversion problems including punctuation restoration, disfluency detection, and simplification. However, most existing corpora tend to be made for just one of these conversion problems. In addition, in Japanese, we have to consider not only general spoken-to-written style conversion problems but also Japanese-specific ones, such as language style unification (e.g., polite, frank, and direct styles) and omitted postpositional particle expressions restoration. Therefore, we created a new Japanese parallel corpus of spoken-style text and written-style text that can simultaneously handle general problems and Japanese-specific ones. To make this corpus, we prepared four types of spoken-style text and utilized a crowdsourcing service for manually converting them into written-style text. This paper describes the building setup of this corpus and reports the baseline results of spoken-to-written style conversion using the latest neural sequence transformation models.

In this paper, we investigate the effectiveness of using rich annotations in deep neural network (DNN)-based statistical speech synthesis. DNN-based frameworks typically use linguistic information as input features called context instead of directly using text. In such frameworks, we can synthesize not only reading-style speech but also speech with paralinguistic and nonlinguistic features by adding such information to the context. However, it is not clear what kind of information is crucial for reproducing paralinguistic and nonlinguistic features. Therefore, we investigate the effectiveness of rich tags in DNN-based speech synthesis according to the Corpus of Spontaneous Japanese (CSJ), which has a large amount of annotations on paralinguistic features such as prosody, disfluency, and morphological features. Experimental evaluation results shows that the reproducibility of paralinguistic features of synthetic speech was enhanced by adding such information as context.

2018

pdf bib abs
Multi-task and Multi-lingual Joint Learning of Neural Lexical Utterance Classification based on Partially-shared Modeling
Ryo Masumura | Tomohiro Tanaka | Ryuichiro Higashinaka | Hirokazu Masataki | Yushi Aono
Proceedings of the 27th International Conference on Computational Linguistics

This paper is an initial study on multi-task and multi-lingual joint learning for lexical utterance classification. A major problem in constructing lexical utterance classification modules for spoken dialogue systems is that individual data resources are often limited or unbalanced among tasks and/or languages. Various studies have examined joint learning using neural-network based shared modeling; however, previous joint learning studies focused on either cross-task or cross-lingual knowledge transfer. In order to simultaneously support both multi-task and multi-lingual joint learning, our idea is to explicitly divide state-of-the-art neural lexical utterance classification into language-specific components that can be shared between different tasks and task-specific components that can be shared between different languages. In addition, in order to effectively transfer knowledge between different task data sets and different language data sets, this paper proposes a partially-shared modeling method that possesses both shared components and components specific to individual data sets. We demonstrate the effectiveness of proposed method using Japanese and English data sets with three different lexical utterance classification tasks.

pdf bib abs
Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification
Ryo Masumura | Yusuke Shinohara | Ryuichiro Higashinaka | Yushi Aono
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper proposes an adversarial training method for the multi-task and multi-lingual joint modeling needed for utterance intent classification. In joint modeling, common knowledge can be efficiently utilized among multiple tasks or multiple languages. This is achieved by introducing both language-specific networks shared among different tasks and task-specific networks shared among different languages. However, the shared networks are often specialized in majority tasks or languages, so performance degradation must be expected for some minor data sets. In order to improve the invariance of shared networks, the proposed method introduces both language-specific task adversarial networks and task-specific language adversarial networks; both are leveraged for purging the task or language dependencies of the shared networks. The effectiveness of the adversarial training proposal is demonstrated using Japanese and English data sets for three different utterance intent classification tasks.

This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker’s utterances and collocutor’s utterances. The proposed method combines multiple time-asynchronous long short-term memory recurrent neural networks, which can capture speaker’s and collocutor’s multiple sequential features, and their interactions. On the assumption of applying the proposed method to spoken dialogue systems, we introduce speaker’s acoustic sequential features and collocutor’s linguistic sequential features, each of which can be extracted in an online manner. Our evaluation confirms the effectiveness of taking dialogue context formed by the speaker’s utterances and collocutor’s utterances into consideration.

2017

pdf bib abs
Hyperspherical Query Likelihood Models with Word Embeddings
Ryo Masumura | Taichi Asami | Hirokazu Masataki | Kugatsu Sadamitsu | Kyosuke Nishida | Ryuichiro Higashinaka
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper presents an initial study on hyperspherical query likelihood models (QLMs) for information retrieval (IR). Our motivation is to naturally utilize pre-trained word embeddings for probabilistic IR. To this end, key idea is to directly leverage the word embeddings as random variables for directional probabilistic models based on von Mises-Fisher distributions which are familiar to cosine distances. The proposed method enables us to theoretically take semantic similarities between document and target queries into consideration without introducing heuristic expansion techniques. In addition, this paper reveals relationships between hyperspherical QLMs and conventional QLMs. Experiments show document retrieval evaluation results in which a hyperspherical QLM is compared to conventional QLMs and document distance metrics using word or document embeddings.

pdf bib abs
Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels
Itsumi Saito | Jun Suzuki | Kyosuke Nishida | Kugatsu Sadamitsu | Satoshi Kobashikawa | Ryo Masumura | Yuji Matsumoto | Junji Tomita
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

In this study, we investigated the effectiveness of augmented data for encoder-decoder-based neural normalization models. Attention based encoder-decoder models are greatly effective in generating many natural languages. % such as machine translation or machine summarization. In general, we have to prepare for a large amount of training data to train an encoder-decoder model. Unlike machine translation, there are few training data for text-normalization tasks. In this paper, we propose two methods for generating augmented data. The experimental results with Japanese dialect normalization indicate that our methods are effective for an encoder-decoder model and achieve higher BLEU score than that of baselines. We also investigated the oracle performance and revealed that there is sufficient room for improving an encoder-decoder model.