Tatsuya Ishigaki


pdf bib
Automating Horizon Scanning in Future Studies
Tatsuya Ishigaki | Suzuko Nishino | Sohei Washino | Hiroki Igarashi | Yukari Nagai | Yuichi Washida | Akihiko Murai
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We introduce document retrieval and comment generation tasks for automating horizon scanning. This is an important task in the field of futurology that collects sufficient information for predicting drastic societal changes in the mid- or long-term future. The steps used are: 1) retrieving news articles that imply drastic changes, and 2) writing subjective comments on each article for others’ ease of understanding. As a first step in automating these tasks, we create a dataset that contains 2,266 manually collected news articles with comments written by experts. We analyze the collected documents and comments regarding characteristic words, the distance to general articles, and contents in the comments. Furthermore, we compare several methods for automating horizon scanning. Our experiments show that 1) manually collected articles are different from general articles regarding the words used and semantic distances, 2) the contents in the comment can be classified into several categories, and 3) a supervised model trained on our dataset achieves a better performance. The contributions are: 1) we propose document retrieval and comment generation tasks for horizon scanning, 2) create and analyze a new dataset, and 3) report the performance of several models and show that comment generation tasks are challenging.


pdf bib
Generating Racing Game Commentary from Vision, Language, and Structured Data
Tatsuya Ishigaki | Goran Topic | Yumi Hamazono | Hiroshi Noji | Ichiro Kobayashi | Yusuke Miyao | Hiroya Takamura
Proceedings of the 14th International Conference on Natural Language Generation

We propose the task of automatically generating commentaries for races in a motor racing game, from vision, structured numerical, and textual data. Commentaries provide information to support spectators in understanding events in races. Commentary generation models need to interpret the race situation and generate the correct content at the right moment. We divide the task into two subtasks: utterance timing identification and utterance generation. Because existing datasets do not have such alignments of data in multiple modalities, this setting has not been explored in depth. In this study, we introduce a new large-scale dataset that contains aligned video data, structured numerical data, and transcribed commentaries that consist of 129,226 utterances in 1,389 races in a game. Our analysis reveals that the characteristics of commentaries change over time or from viewpoints. Our experiments on the subtasks show that it is still challenging for a state-of-the-art vision encoder to capture useful information from videos to generate accurate commentaries. We make the dataset and baseline implementation publicly available for further research.

pdf bib
Unpredictable Attributes in Market Comment Generation
Yumi Hamazono | Tatsuya Ishigaki | Yusuke Miyao | Hiroya Takamura | Ichiro Kobayashi
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation


pdf bib
Learning with Contrastive Examples for Data-to-Text Generation
Yui Uehara | Tatsuya Ishigaki | Kasumi Aoki | Hiroshi Noji | Keiichi Goshima | Ichiro Kobayashi | Hiroya Takamura | Yusuke Miyao
Proceedings of the 28th International Conference on Computational Linguistics

Existing models for data-to-text tasks generate fluent but sometimes incorrect sentences e.g., “Nikkei gains” is generated when “Nikkei drops” is expected. We investigate models trained on contrastive examples i.e., incorrect sentences or terms, in addition to correct ones to reduce such errors. We first create rules to produce contrastive examples from correct ones by replacing frequent crucial terms such as “gain” or “drop”. We then use learning methods with several losses that exploit contrastive examples. Experiments on the market comment generation task show that 1) exploiting contrastive examples improves the capability of generating sentences with better lexical choice, without degrading the fluency, 2) the choice of the loss function is an important factor because the performances on different metrics depend on the types of loss functions, and 3) the use of the examples produced by some specific rules further improves performance. Human evaluation also supports the effectiveness of using contrastive examples.


pdf bib
Discourse-Aware Hierarchical Attention Network for Extractive Single-Document Summarization
Tatsuya Ishigaki | Hidetaka Kamigaito | Hiroya Takamura | Manabu Okumura
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Discourse relations between sentences are often represented as a tree, and the tree structure provides important information for summarizers to create a short and coherent summary. However, current neural network-based summarizers treat the source document as just a sequence of sentences and ignore the tree-like discourse structure inherent in the document. To incorporate the information of a discourse tree structure into the neural network-based summarizers, we propose a discourse-aware neural extractive summarizer which can explicitly take into account the discourse dependency tree structure of the source document. Our discourse-aware summarizer can jointly learn the discourse structure and the salience score of a sentence by using novel hierarchical attention modules, which can be trained on automatically parsed discourse dependency trees. Experimental results showed that our model achieved competitive or better performances against state-of-the-art models in terms of ROUGE scores on the DailyMail dataset. We further conducted manual evaluations. The results showed that our approach also gained the coherence of the output summaries.

pdf bib
Controlling Contents in Data-to-Document Generation with Human-Designed Topic Labels
Kasumi Aoki | Akira Miyazawa | Tatsuya Ishigaki | Tatsuya Aoki | Hiroshi Noji | Keiichi Goshima | Ichiro Kobayashi | Hiroya Takamura | Yusuke Miyao
Proceedings of the 12th International Conference on Natural Language Generation

We propose a data-to-document generator that can easily control the contents of output texts based on a neural language model. Conventional data-to-text model is useful when a reader seeks a global summary of data because it has only to describe an important part that has been extracted beforehand. However, because depending on users, it differs what they are interested in, so it is necessary to develop a method to generate various summaries according to users’ interests. We develop a model to generate various summaries and to control their contents by providing the explicit targets for a reference to the model as controllable factors. In the experiments, we used five-minute or one-hour charts of 9 indicators (e.g., Nikkei225), as time-series data, and daily summaries of Nikkei Quick News as textual data. We conducted comparative experiments using two pieces of information: human-designed topic labels indicating the contents of a sentence and automatically extracted keywords as the referential information for generation.

pdf bib
Learning to Select, Track, and Generate for Data-to-Text
Hayate Iso | Yui Uehara | Tatsuya Ishigaki | Hiroshi Noji | Eiji Aramaki | Ichiro Kobayashi | Yusuke Miyao | Naoaki Okazaki | Hiroya Takamura
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a data-to-text generation model with two modules, one for tracking and the other for text generation. Our tracking module selects and keeps track of salient information and memorizes which record has been mentioned. Our generation module generates a summary conditioned on the state of tracking module. Our proposed model is considered to simulate the human-like writing process that gradually selects the information by determining the intermediate variables while writing the summary. In addition, we also explore the effectiveness of the writer information for generations. Experimental results show that our proposed model outperforms existing models in all evaluation metrics even without writer information. Incorporating writer information further improves the performance, contributing to content planning and surface realization.


pdf bib
Generating Market Comments Referring to External Resources
Tatsuya Aoki | Akira Miyazawa | Tatsuya Ishigaki | Keiichi Goshima | Kasumi Aoki | Ichiro Kobayashi | Hiroya Takamura | Yusuke Miyao
Proceedings of the 11th International Conference on Natural Language Generation

Comments on a stock market often include the reason or cause of changes in stock prices, such as “Nikkei turns lower as yen’s rise hits exporters.” Generating such informative sentences requires capturing the relationship between different resources, including a target stock price. In this paper, we propose a model for automatically generating such informative market comments that refer to external resources. We evaluated our model through an automatic metric in terms of BLEU and human evaluation done by an expert in finance. The results show that our model outperforms the existing model both in BLEU scores and human judgment.


pdf bib
Summarizing Lengthy Questions
Tatsuya Ishigaki | Hiroya Takamura | Manabu Okumura
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this research, we propose the task of question summarization. We first analyzed question-summary pairs extracted from a Community Question Answering (CQA) site, and found that a proportion of questions cannot be summarized by extractive approaches but requires abstractive approaches. We created a dataset by regarding the question-title pairs posted on the CQA site as question-summary pairs. By using the data, we trained extractive and abstractive summarization models, and compared them based on ROUGE scores and manual evaluations. Our experimental results show an abstractive method using an encoder-decoder model with a copying mechanism achieves better scores for both ROUGE-2 F-measure and the evaluations by human judges.