Taichi Nishimura


pdf bib
Visual Grounding Annotation of Recipe Flow Graph
Taichi Nishimura | Suzushi Tomori | Hayato Hashimoto | Atsushi Hashimoto | Yoko Yamakata | Jun Harashima | Yoshitaka Ushiku | Shinsuke Mori
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we provide a dataset that gives visual grounding annotations to recipe flow graphs. A recipe flow graph is a representation of the cooking workflow, which is designed with the aim of understanding the workflow from natural language processing. Such a workflow will increase its value when grounded to real-world activities, and visual grounding is a way to do so. Visual grounding is provided as bounding boxes to image sequences of recipes, and each bounding box is linked to an element of the workflow. Because the workflows are also linked to the text, this annotation gives visual grounding with workflow’s contextual information between procedural text and visual observation in an indirect manner. We subsidiarily annotated two types of event attributes with each bounding box: “doing-the-action,” or “done-the-action”. As a result of the annotation, we got 2,300 bounding boxes in 272 flow graph recipes. Various experiments showed that the proposed dataset enables us to estimate contextual information described in recipe flow graphs from an image sequence.


pdf bib
Procedural Text Generation from a Photo Sequence
Taichi Nishimura | Atsushi Hashimoto | Shinsuke Mori
Proceedings of the 12th International Conference on Natural Language Generation

Multimedia procedural texts, such as instructions and manuals with pictures, support people to share how-to knowledge. In this paper, we propose a method for generating a procedural text given a photo sequence allowing users to obtain a multimedia procedural text. We propose a single embedding space both for image and text enabling to interconnect them and to select appropriate words to describe a photo. We implemented our method and tested it on cooking instructions, i.e., recipes. Various experimental results showed that our method outperforms standard baselines.