Recipe Instruction Semantics Corpus (RISeC): Resolving Semantic Structure and Zero Anaphora in Recipes

Yiwei Jiang, Klim Zaporojets, Johannes Deleu, Thomas Demeester, Chris Develder


Abstract
We propose a newly annotated dataset for information extraction on recipes. Unlike previous approaches to machine comprehension of procedural texts, we avoid a priori pre-defining domain-specific predicates to recognize (e.g., the primitive instructionsin MILK) and focus on basic understanding of the expressed semantics rather than directly reduce them to a simplified state representation (e.g., ProPara). We thus frame the semantic comprehension of procedural text such as recipes, as fairly generic NLP subtasks, covering (i) entity recognition (ingredients, tools and actions), (ii) relation extraction (what ingredients and tools are involved in the actions), and (iii) zero anaphora resolution (link actions to implicit arguments, e.g., results from previous recipe steps). Further, our Recipe Instruction Semantic Corpus (RISeC) dataset includes textual descriptions for the zero anaphora, to facilitate language generation thereof. Besides the dataset itself, we contribute a pipeline neural architecture that addresses entity and relation extractionas well an identification of zero anaphora. These basic building blocks can facilitate more advanced downstream applications (e.g., question answering, conversational agents).
Anthology ID:
2020.aacl-main.82
Volume:
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
Month:
December
Year:
2020
Address:
Suzhou, China
Editors:
Kam-Fai Wong, Kevin Knight, Hua Wu
Venue:
AACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
821–826
Language:
URL:
https://aclanthology.org/2020.aacl-main.82
DOI:
10.18653/v1/2020.aacl-main.82
Bibkey:
Cite (ACL):
Yiwei Jiang, Klim Zaporojets, Johannes Deleu, Thomas Demeester, and Chris Develder. 2020. Recipe Instruction Semantics Corpus (RISeC): Resolving Semantic Structure and Zero Anaphora in Recipes. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 821–826, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Recipe Instruction Semantics Corpus (RISeC): Resolving Semantic Structure and Zero Anaphora in Recipes (Jiang et al., AACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.aacl-main.82.pdf
Data
RISeC