Jiali Li


2024

pdf bib
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models
Yizhi Li | Ge Zhang | Xingwei Qu | Jiali Li | Zhaoqun Li | Noah Wang | Hao Li | Ruibin Yuan | Yinghao Ma | Kai Zhang | Wangchunshu Zhou | Yiming Liang | Lei Zhang | Lei Ma | Jiajun Zhang | Zuowen Li | Wenhao Huang | Chenghua Lin | Jie Fu
Findings of the Association for Computational Linguistics ACL 2024

The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following.Yet, their effectiveness often diminishes in low-resource languages like Chinese, exacerbated by biased evaluations from data leakage, casting doubt on their true generalizability to new linguistic territories. In response, we introduce the Chinese Instruction-Following Benchmark (**CIF-Bench**), designed to evaluate the zero-shot generalizability of LLMs to the Chinese language. CIF-Bench comprises 150 tasks and 15,000 input-output pairs, developed by native speakers to test complex reasoning and Chinese cultural nuances across 20 categories. To mitigate data contamination, we release only half of the dataset publicly, with the remainder kept private, and introduce diversified instructions to minimize score variance, totaling 45,000 data instances.Our evaluation of 28 selected LLMs reveals a noticeable performance gap, with the best model scoring only 52.9%, highlighting the limitations of LLMs in less familiar language and task contexts.This work not only uncovers the current limitations of LLMs in handling Chinese language tasks but also sets a new standard for future LLM generalizability research, pushing towards the development of more adaptable, culturally informed, and linguistically diverse models.

2022

pdf bib
Analysis of Gender Bias in Social Perception and Judgement Using Chinese Word Embeddings
Jiali Li | Shucheng Zhu | Ying Liu | Pengyuan Liu
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Gender is a construction in line with social perception and judgment. An important means of this construction is through languages. When natural language processing tools, such as word embeddings, associate gender with the relevant categories of social perception and judgment, it is likely to cause bias and harm to those groups that do not conform to the mainstream social perception and judgment. Using 12,251 Chinese word embeddings as intermedium, this paper studies the relationship between social perception and judgment categories and gender. The results reveal that these grammatical gender-neutral Chinese word embeddings show a certain gender bias, which is consistent with the mainstream society’s perception and judgment of gender. Men are judged by their actions and perceived as bad, easily-disgusted, bad-tempered and rational roles while women are judged by their appearances and perceived as perfect, either happy or sad, and emotional roles.

pdf bib
中文自然语言处理多任务中的职业性别偏见测量(Measurement of Occupational Gender Bias in Chinese Natural Language Processing Tasks)
Mengqing Guo (郭梦清) | Jiali Li (李加厉) | Jishun Zhao (赵继舜) | Shucheng Zhu (朱述承) | Ying Liu (刘颖) | Pengyuan Liu (刘鹏远)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“尽管悲观者认为,职场中永远不可能存在性别平等。但随着人们观念的转变,愈来愈多的人们相信,职业的选择应只与个人能力相匹配,而不应由个体的性别决定。目前已经发现自然语言处理的各个任务中都存在着职业性别偏见。但这些研究往往只针对特定的英文任务,缺乏针对中文的、综合多任务的职业性别偏见测量研究。本文基于霍兰德职业模型,从中文自然语言处理中常见的三个任务出发,测量了词向量、共指消解和文本生成中的职业性别偏见,发现不同任务中的职业性别偏见既有一定的共性,又存在着独特的差异性。总体来看,不同任务中的职业性别偏见反映了现实生活中人们对于不同性别所选择职业的刻板印象。此外,在设计不同任务的偏见测量指标时,还需要考虑如语体、词序等语言学要素的影响。”