UNIWIZ: A Unified Large Language Model Orchestrated Wizard for Safe Knowledge Grounded Conversations

Souvik Das; Rohini K. Srihari

doi:10.18653/v1/2024.findings-acl.102

UNIWIZ: A Unified Large Language Model Orchestrated Wizard for Safe Knowledge Grounded Conversations

Abstract

Large Language Models (LLMs) have made significant progress in integrating safety and knowledge alignment. However, adversarial actors can manipulate these models into generating unsafe responses, and excessive safety alignment can lead to unintended hallucinations. To address these challenges, we introduce UniWiz, a novel 2-step data orchestration framework that unifies safety and knowledge data generation. We propose a “safety-priming” method to generate synthetic safety data and overcome safety bottlenecks. We also inject relevant knowledge into conversations by retrieving factual information from curated sources. UniWiz dataset consists of 17,638 quality-controlled conversations and 10,000 augmented preference data. Pretrained models fine-tuned on UniWiz show improvements across various metrics and outperform state-of-the-art instruction-tuned models trained on much larger datasets.

Anthology ID:: 2024.findings-acl.102
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1749–1762
Language:
URL:: https://aclanthology.org/2024.findings-acl.102/
DOI:: 10.18653/v1/2024.findings-acl.102
Bibkey:
Cite (ACL):: Souvik Das and Rohini Srihari. 2024. UNIWIZ: A Unified Large Language Model Orchestrated Wizard for Safe Knowledge Grounded Conversations. In Findings of the Association for Computational Linguistics: ACL 2024, pages 1749–1762, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: UNIWIZ: A Unified Large Language Model Orchestrated Wizard for Safe Knowledge Grounded Conversations (Das & Srihari, Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.102.pdf

PDF Cite Search Fix data