Zooming in on Zero-Shot Intent-Guided and Grounded Document Generation using LLMs

Pritika Ramu, Pranshu Gaur, Rishita Emandi, Himanshu Maheshwari, Danish Javed, Aparna Garimella


Abstract
Repurposing existing content on-the-fly to suit author’s goals for creating initial drafts is crucial for document creation. We introduce the task of intent-guided and grounded document generation: given a user-specified intent (e.g., section title) and a few reference documents, the goal is to generate section-level multimodal documents spanning text and images, grounded on the given references, in a zero-shot setting. We present a data curation strategy to obtain general-domain samples from Wikipedia, and collect 1,000 Wikipedia sections consisting of textual and image content along with appropriate intent specifications and references. We propose a simple yet effective planning-based prompting strategy, Multimodal Plan-And-Write (MM-PAW), to prompt LLMs to generate an intermediate plan with text and image descriptions, to guide the subsequent generation. We compare the performances of MM-PAW and a text-only variant of it with those of zero-shot Chain-of-Thought (CoT) using recent close and open-domain LLMs. Both of them lead to significantly better performances in terms of content relevance, structure, and groundedness to the references, more so in the smaller models (upto 12.5 points increase in Rouge 1-F1) than in the larger ones (upto 4 points increase in R1-F1). They are particularly effective in improving relatively smaller models’ performances, to be on par or higher than those of their larger counterparts for this task.
Anthology ID:
2024.inlg-main.52
Volume:
Proceedings of the 17th International Natural Language Generation Conference
Month:
September
Year:
2024
Address:
Tokyo, Japan
Editors:
Saad Mahamood, Nguyen Le Minh, Daphne Ippolito
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
676–694
Language:
URL:
https://aclanthology.org/2024.inlg-main.52
DOI:
Bibkey:
Cite (ACL):
Pritika Ramu, Pranshu Gaur, Rishita Emandi, Himanshu Maheshwari, Danish Javed, and Aparna Garimella. 2024. Zooming in on Zero-Shot Intent-Guided and Grounded Document Generation using LLMs. In Proceedings of the 17th International Natural Language Generation Conference, pages 676–694, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
Zooming in on Zero-Shot Intent-Guided and Grounded Document Generation using LLMs (Ramu et al., INLG 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.inlg-main.52.pdf
Supplementary attachment:
 2024.inlg-main.52.Supplementary_Attachment.pdf