CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

Yufei Huang; Deyi Xiong

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

Abstract

Holistically measuring societal biases of large language models is crucial for detecting and reducing ethical risks in highly capable AI models. In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values. The curation process contains 4 essential steps: bias identification, ambiguous context generation, AI-assisted disambiguous context generation, and manual review and recomposition. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. The dataset exhibits wide coverage and high diversity. Extensive experiments demonstrate the effectiveness of the dataset in evaluating model bias, with all 12 publicly available Chinese large language models exhibiting strong bias in certain categories. Additionally, we observe from our experiments that fine-tuned models could, to a certain extent, heed instructions and avoid generating harmful outputs, in the way of “moral self-correction”. Our dataset is available at https://anonymous.4open.science/r/CBBQ-B860/.

Anthology ID:: 2024.lrec-main.260
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 2917–2929
Language:
URL:: https://aclanthology.org/2024.lrec-main.260/
DOI:
Bibkey:
Cite (ACL):: Yufei Huang and Deyi Xiong. 2024. CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2917–2929, Torino, Italia. ELRA and ICCL.
Cite (Informal):: CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models (Huang & Xiong, LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.260.pdf

PDF Cite Search Fix data