Suri: Multi-constraint Instruction Following in Long-form Text Generation

Chau Pham, Simeng Sun, Mohit Iyyer


Abstract
Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with 20K human-written long-form texts paired with LLM-generated backtranslated instructions that contain multiple complex constraints. Because of prohibitive challenges associated with collecting human preference judgments on long-form texts, preference-tuning algorithms such as DPO are infeasible in our setting; thus, we propose Instructional ORPO (I-ORPO), an alignment method based on the ORPO algorithm. Instead of receiving negative feedback from dispreferred responses, I-ORPO obtains negative feedback from synthetically corrupted instructions generated by an LLM. Using Suri, we perform supervised and I-ORPO fine-tuning on Mistral-7b-Instruct-v0.2. The resulting models, Suri-SFT and Suri-I-ORPO, generate significantly longer texts (5K tokens) than base models without significant quality deterioration. Our human evaluation shows that while both SFT and I-ORPO models satisfy most constraints, Suri-I-ORPO generations are generally preferred for their coherent and informative incorporation of the constraints.
Anthology ID:
2024.findings-emnlp.94
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1722–1753
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.94
DOI:
Bibkey:
Cite (ACL):
Chau Pham, Simeng Sun, and Mohit Iyyer. 2024. Suri: Multi-constraint Instruction Following in Long-form Text Generation. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1722–1753, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Suri: Multi-constraint Instruction Following in Long-form Text Generation (Pham et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.94.pdf