2024
pdf
bib
abs
Do PLMs and Annotators Share the Same Gender Bias? Definition, Dataset, and Framework of Contextualized Gender Bias
Shucheng Zhu
|
Bingjie Du
|
Jishun Zhao
|
Ying Liu
|
Pengyuan Liu
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Pre-trained language models (PLMs) have achieved success in various of natural language processing (NLP) tasks. However, PLMs also introduce some disquieting safety problems, such as gender bias. Gender bias is an extremely complex issue, because different individuals may hold disparate opinions on whether the same sentence expresses harmful bias, especially those seemingly neutral or positive. This paper first defines the concept of contextualized gender bias (CGB), which makes it easy to measure implicit gender bias in both PLMs and annotators. We then construct CGBDataset, which contains 20k natural sentences with gendered words, from Chinese news. Similar to the task of masked language models, gendered words are masked for PLMs and annotators to judge whether a male word or a female word is more suitable. Then, we introduce CGBFrame to measure the gender bias of annotators. By comparing the results measured by PLMs and annotators, we find that though there are differences on the choices made by PLMs and annotators, they show significant consistency in general.
2022
pdf
bib
abs
中文自然语言处理多任务中的职业性别偏见测量(Measurement of Occupational Gender Bias in Chinese Natural Language Processing Tasks)
Mengqing Guo (郭梦清)
|
Jiali Li (李加厉)
|
Jishun Zhao (赵继舜)
|
Shucheng Zhu (朱述承)
|
Ying Liu (刘颖)
|
Pengyuan Liu (刘鹏远)
Proceedings of the 21st Chinese National Conference on Computational Linguistics
“尽管悲观者认为,职场中永远不可能存在性别平等。但随着人们观念的转变,愈来愈多的人们相信,职业的选择应只与个人能力相匹配,而不应由个体的性别决定。目前已经发现自然语言处理的各个任务中都存在着职业性别偏见。但这些研究往往只针对特定的英文任务,缺乏针对中文的、综合多任务的职业性别偏见测量研究。本文基于霍兰德职业模型,从中文自然语言处理中常见的三个任务出发,测量了词向量、共指消解和文本生成中的职业性别偏见,发现不同任务中的职业性别偏见既有一定的共性,又存在着独特的差异性。总体来看,不同任务中的职业性别偏见反映了现实生活中人们对于不同性别所选择职业的刻板印象。此外,在设计不同任务的偏见测量指标时,还需要考虑如语体、词序等语言学要素的影响。”
2021
pdf
bib
abs
中文句子级性别无偏数据集构建及预训练语言模型的性别偏度评估(Construction of Chinese Sentence-Level Gender-Unbiased Data Set and Evaluation of Gender Bias in Pre-Training Language)
Jishun Zhao (赵继舜)
|
Bingjie Du (杜冰洁)
|
Shucheng Zhu (朱述承)
|
Pengyuan Liu (刘鹏远)
Proceedings of the 20th Chinese National Conference on Computational Linguistics
自然语言处理领域各项任务中,模型广泛存在性别偏见。然而当前尚无中文性别偏见评估和消偏的相关数据集,因此无法对中文自然语言处理模型中的性别偏见进行评估。首先本文根据16对性别称谓词,从一个平面媒体语料库中筛选出性别无偏的句子,构建了一个含有20000条语句的中文句子级性别无偏数据集SlguSet。随后,本文提出了一个可衡量预训练语言模型性别偏见程度的指标,并对5种流行的预训练语言模型中的性别偏见进行评估。结果表明,中文预训练语言模型中存在不同程度的性别偏见,该文所构建数据集能够很好的对中文预训练语言模型中的性别偏见进行评估。同时,该数据集还可作为评估预训练语言模型消偏方法的数据集。
pdf
bib
abs
中文关系抽取的句级语言学特征探究(A Probe into the Sentence-level Linguistic Features of Chinese Relation Extraction)
Baixi Xing (邢百西)
|
Jishun Zhao (赵继舜)
|
Pengyuan Liu (刘鹏远)
Proceedings of the 20th Chinese National Conference on Computational Linguistics
神经网络模型近些年在关系抽取任务上已经展示出了很好的效果,然而我们对于特征提取的过程所知甚少,而这也进一步限制了深度神经网络模型在关系抽取任务上的进一步发展。当前已有研究工作对英文关系抽取的语言学特征进行探究,并且得到了一些规律。然而由于中文与西方语言之间明显的差异性,其所探究到的规律与解释性不适用于中文关系抽取。本文首次对中文关系抽取神经网络进行探究,采用了四个角度共13种探究任务,其中包含中文特有的分词探究任务。在两个关系抽取数据集上进行了实验,探究了中文关系抽取模型进行特征提取的规律。