Suri: Multi-constraint Instruction Following in Long-form Text Generation

Chau Minh Pham; Simeng Sun; Mohit Iyyer

doi:10.18653/v1/2024.findings-emnlp.94

Suri: Multi-constraint Instruction Following in Long-form Text Generation

Abstract

Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with 20K human-written long-form texts paired with LLM-generated backtranslated instructions that contain multiple complex constraints. Because of prohibitive challenges associated with collecting human preference judgments on long-form texts, preference-tuning algorithms such as DPO are infeasible in our setting; thus, we propose Instructional ORPO (I-ORPO), an alignment method based on the ORPO algorithm. Instead of receiving negative feedback from dispreferred responses, I-ORPO obtains negative feedback from synthetically corrupted instructions generated by an LLM. Using Suri, we perform supervised and I-ORPO fine-tuning on Mistral-7b-Instruct-v0.2. The resulting models, Suri-SFT and Suri-I-ORPO, generate significantly longer texts (5K tokens) than base models without significant quality deterioration. Our human evaluation shows that while both SFT and I-ORPO models satisfy most constraints, Suri-I-ORPO generations are generally preferred for their coherent and informative incorporation of the constraints.

Anthology ID:: 2024.findings-emnlp.94
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1722–1753
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.94/
DOI:: 10.18653/v1/2024.findings-emnlp.94
Bibkey:
Cite (ACL):: Chau Minh Pham, Simeng Sun, and Mohit Iyyer. 2024. Suri: Multi-constraint Instruction Following in Long-form Text Generation. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1722–1753, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Suri: Multi-constraint Instruction Following in Long-form Text Generation (Pham et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.94.pdf

PDF Cite Search Fix data