Cookpad Parsed Corpus: Linguistic Annotations of Japanese Recipes

Jun Harashima, Makoto Hiramatsu


Abstract
It has become increasingly common for people to share cooking recipes on the Internet. Along with the increase in the number of shared recipes, there have been corresponding increases in recipe-related studies and datasets. However, there are still few datasets that provide linguistic annotations for the recipe-related studies even though such annotations should form the basis of the studies. This paper introduces a novel recipe-related dataset, named Cookpad Parsed Corpus, which contains linguistic annotations for Japanese recipes. We randomly extracted 500 recipes from the largest recipe-related dataset, the Cookpad Recipe Dataset, and annotated 4; 738 sentences in the recipes with morphemes, named entities, and dependency relations. This paper also reports benchmark results on our corpus for Japanese morphological analysis, named entity recognition, and dependency parsing. We show that there is still room for improvement in the analyses of recipes.
Anthology ID:
2020.law-1.8
Volume:
Proceedings of the 14th Linguistic Annotation Workshop
Month:
December
Year:
2020
Address:
Barcelona, Spain
Editors:
Stefanie Dipper, Amir Zeldes
Venue:
LAW
SIG:
SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
87–92
Language:
URL:
https://aclanthology.org/2020.law-1.8
DOI:
Bibkey:
Cite (ACL):
Jun Harashima and Makoto Hiramatsu. 2020. Cookpad Parsed Corpus: Linguistic Annotations of Japanese Recipes. In Proceedings of the 14th Linguistic Annotation Workshop, pages 87–92, Barcelona, Spain. Association for Computational Linguistics.
Cite (Informal):
Cookpad Parsed Corpus: Linguistic Annotations of Japanese Recipes (Harashima & Hiramatsu, LAW 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.law-1.8.pdf
Data
Microsoft Research Multimodal Aligned Recipe Corpus