Zhenqi Ye
2025
Sequence Structure Aware Retriever for Procedural Document Retrieval: A New Dataset and Baseline
Zhenqi Ye
|
HaoPeng Ren
|
Yi Cai
|
Qingbao Huang
|
Jing Qin
|
Pinli Zhu
|
Songwen Gong
Findings of the Association for Computational Linguistics: EMNLP 2025
Execution failures are common in daily life when individuals perform procedural tasks, such as cooking or handicrafts making. Retrieving relevant procedural documents that align closely with both the content of steps and the overall execution sequence can help correct these failures with fewer modifications. However, existing retrieval methods, which primarily focus on declarative knowledge, often neglect the execution sequence structures inherent in procedural documents. To tackle this challenge, we introduce a new dataset Procedural Questions, and propose a retrieval model Graph-Fusion Procedural Document Retriever (GFPDR) which integrates procedural graphs with document representations. Extensive experiments demonstrate the effectiveness of GFPDR, highlighting its superior performance in procedural document retrieval compared to existing models.
2024
Grounded Multimodal Procedural Entity Recognition for Procedural Documents: A New Dataset and Baseline
Haopeng Ren
|
Yushi Zeng
|
Yi Cai
|
Zhenqi Ye
|
Li Yuan
|
Pinli Zhu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Much of commonsense knowledge in real world is the form of procudures or sequences of steps to achieve particular goals. In recent years, knowledge extraction on procedural documents has attracted considerable attention. However, they often focus on procedural text but ignore a common multimodal scenario in the real world. Images and text can complement each other semantically, alleviating the semantic ambiguity suffered in text-only modality. Motivated by these, in this paper, we explore a problem of grounded multimodal procedural entity recognition (GMPER), aiming to detect the entity and the corresponding bounding box groundings in image (i.e., visual entities). A new dataset (Wiki-GMPER) is bult and extensive experiments are conducted to evaluate the effectiveness of our proposed model.