In this paper, we provide a dataset that gives visual grounding annotations to recipe flow graphs. A recipe flow graph is a representation of the cooking workflow, which is designed with the aim of understanding the workflow from natural language processing. Such a workflow will increase its value when grounded to real-world activities, and visual grounding is a way to do so. Visual grounding is provided as bounding boxes to image sequences of recipes, and each bounding box is linked to an element of the workflow. Because the workflows are also linked to the text, this annotation gives visual grounding with workflow’s contextual information between procedural text and visual observation in an indirect manner. We subsidiarily annotated two types of event attributes with each bounding box: “doing-the-action,” or “done-the-action”. As a result of the annotation, we got 2,300 bounding boxes in 272 flow graph recipes. Various experiments showed that the proposed dataset enables us to estimate contextual information described in recipe flow graphs from an image sequence.