Yumi Hamazono


pdf bib
Generating Racing Game Commentary from Vision, Language, and Structured Data
Tatsuya Ishigaki | Goran Topic | Yumi Hamazono | Hiroshi Noji | Ichiro Kobayashi | Yusuke Miyao | Hiroya Takamura
Proceedings of the 14th International Conference on Natural Language Generation

We propose the task of automatically generating commentaries for races in a motor racing game, from vision, structured numerical, and textual data. Commentaries provide information to support spectators in understanding events in races. Commentary generation models need to interpret the race situation and generate the correct content at the right moment. We divide the task into two subtasks: utterance timing identification and utterance generation. Because existing datasets do not have such alignments of data in multiple modalities, this setting has not been explored in depth. In this study, we introduce a new large-scale dataset that contains aligned video data, structured numerical data, and transcribed commentaries that consist of 129,226 utterances in 1,389 races in a game. Our analysis reveals that the characteristics of commentaries change over time or from viewpoints. Our experiments on the subtasks show that it is still challenging for a state-of-the-art vision encoder to capture useful information from videos to generate accurate commentaries. We make the dataset and baseline implementation publicly available for further research.

pdf bib
Unpredictable Attributes in Market Comment Generation
Yumi Hamazono | Tatsuya Ishigaki | Yusuke Miyao | Hiroya Takamura | Ichiro Kobayashi
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation


pdf bib
Market Comment Generation from Data with Noisy Alignments
Yumi Hamazono | Yui Uehara | Hiroshi Noji | Yusuke Miyao | Hiroya Takamura | Ichiro Kobayashi
Proceedings of the 13th International Conference on Natural Language Generation

End-to-end models on data-to-text learn the mapping of data and text from the aligned pairs in the dataset. However, these alignments are not always obtained reliably, especially for the time-series data, for which real time comments are given to some situation and there might be a delay in the comment delivery time compared to the actual event time. To handle this issue of possible noisy alignments in the dataset, we propose a neural network model with multi-timestep data and a copy mechanism, which allows the models to learn the correspondences between data and text from the dataset with noisier alignments. We focus on generating market comments in Japanese that are delivered each time an event occurs in the market. The core idea of our approach is to utilize multi-timestep data, which is not only the latest market price data when the comment is delivered, but also the data obtained at several timesteps earlier. On top of this, we employ a copy mechanism that is suitable for referring to the content of data records in the market price data. We confirm the superiority of our proposal by two evaluation metrics and show the accuracy improvement of the sentence generation using the time series data by our proposed method.