Pengyuan Liu (刘鹏远)

Pengyuan Liu

Also published as: Peng-Yuan Liu, PengYuan Liu, 鹏远刘

2025

"中华优秀传统文化是提升我国新时代文化软实力的重要源泉,将传统价值观和成语相结合,有助于继承和弘扬我们的优秀文明。本文提出了传统价值观成语当代语境表现的研究框架,基于BCC语料库对传统价值观成语语料数量分布和成语传统价值观偏好分布特征、在当代语境中的情感倾向及高频词分布特点、社会话题及道德特征进行计量研究,并提出了传统价值观成语的当代社会话题及道德适应性指数,以系统研究传统价值观成语的当代语境表现。本文为传统文化的当代计量研究提供了新的视角,也为数字人文领域的相关研究提供了参考依据,旨在增强中华优秀传统文化在当今新时代的影响力,为中华文明的传承与创新作出贡献。"

pdf bib abs

Investigating Value-Reasoning Reliability in Small Large Language Models
Xia Du | Shuhan Sun | Pengyuan Liu | Dong Yu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Although small Large Language models (sLLMs) have been widely deployed in practical applications, little attention has been paid to their value-reasoning abilities, particularly in terms of reasoning reliability. To address this gap, we propose a systematic evaluation framework for assessing the Value-Reasoning Reliability of sLLMs. We define Value-Reasoning Reliability as comprising: (1) Output consistency under identical prompts, (2) Output Robustness under semantically equivalent prompts, (3) Maintaining stable value reasoning in the face of attacks, and (4) Consistency of value reasoning in open-ended value expression tasks. Our framework includes three core tasks: Repetition Consistency task, Interaction Stability task, and Open-ended Expression Consistency task. We further incorporate self-reported confidence scores to evaluate the model’s value reasoning reliability from two perspectives: the model’s self-awareness of its values, and its value-based decision-making. Our findings show that models vary significantly in their stability when responding to value-related questions. Moreover, we observe considerable output randomness, which is not always correlated with the self-reported confidence or expressed value preferences. This suggests that current models lack a reliable internal mechanism for stable value reasoning when addressing value-sensitive queries.

pdf bib abs

基于多模型协同的儿童互联网新闻风险管理与价值观引导框架
梁宇蓝梁宇蓝 | 王悦王悦 | Dong Yu | Pengyuan Liu | Chen Kang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

随着互联网在儿童群体中的广泛普及,新闻内容的”毒性遗留”与价值观缺失已成为亟待解决的安全挑战。本文提出了一种多模型协同的儿童新闻改写框架(CRV-LLM),旨在从词汇、事件、标题和价值观四个维度,对原始新闻文本进行深度风险识别与精准改写。CRV-LLM集成了四个轻量化风险检测模型和R1-Distill-Qwen-32B改写模型,通过模型间的协同与反馈,能够在保证儿童可读性的前提下,有效剔除潜在有害信息并植入积极价值引导。实验结果表明,CRV-LLM框架在安全性、教育性等核心指标上优于主流模型,且推理效率提升62%,为儿童互联网内容安全管理提供了一种高效、可扩展的技术方案。

pdf bib abs

Attribution and Application of Multiple Neurons in Multimodal Large Language Models
Feiyu Wang | Ziran Zhao | Dong Yu | Pengyuan Liu
Findings of the Association for Computational Linguistics: EMNLP 2025

Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance across various tasks. However, the internal mechanisms by which they interpret and integrate cross-modal information remain insufficiently understood. In this paper, to address the limitations of prior studies that could only identify neurons corresponding to single-token and rely on the vocabulary of LLMs, we propose a novel method to identify multimodal neurons in Transformer-based MLLMs. Then we introduce fuzzy set theory to model the complex relationship between neurons and semantic concepts and to characterize how multiple neurons collaboratively contribute to semantic concepts. Through both theoretical analysis and empirical validation, we demonstrate the effectiveness of our method and present some meaningful findings. Furthermore, by modulating neuron activation values based on the constructed fuzzy sets, we enhance performance on the Visual Question Answering (VQA) task, showing the practical value of our approach in downstream applications in MLLMs.

pdf bib abs

What’s the most important value? INVP: INvestigating the Value Priorities of LLMs through Decision-making in Social Scenarios
Xuelin Liu | Pengyuan Liu | Dong Yu
Proceedings of the 31st International Conference on Computational Linguistics

As large language models (LLMs) demonstrate impressive performance in various tasks and are increasingly integrated into the decision-making process, ensuring they align with human values has become crucial. This paper highlights that value priorities—the relative importance of different value—play a pivotal role in the decision-making process. To explore the value priorities in LLMs, this paper introduces INVP, a framework for INvestigating Value Priorities through decision-making in social scenarios. The framework encompasses social scenarios including binary decision-making, covering both individual and collective decision-making contexts, and is based on Schwartz’s value theory for constructing value priorities. Using this framework, we construct a dataset, which contains a total of 1613 scenarios and 3226 decisions across 283 topics. We evaluate seven popular LLMs and the experimental results reveal commonalities in the value priorities across different LLMs, such as an emphasis on Universalism and Benevolence, while Power and Hedonism are typically given lower priority. This study provides fresh insights into understanding and enhancing the moral and value alignment of LLMs when making complex social decisions.

2024

pdf bib abs

文本样式和主题框架引导下的大模型辅助儿童新闻生成(Text Styles and Thematic Framework Guided Large Modeling to Aid Children’s News Generation)
Xiaomeng Du (杜晓蒙) | Dong Yu (于东) | Pengyuan Liu (刘鹏远)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“主流新闻内容多针对成年人设计,不易于儿童理解,难以满足其阅读需求。对此,我们提出了一种基于主题的儿童新闻篇章结构框架(TNC-LLM)。该框架融合了文本样式定义(TSD)和主题类别定义(TCD)两大核心模块,TSD模块采用多种机器学习算法,从不同粒度分析文本样式风格和段落布局等特点,TCD模块针对不同主题进行了内容分析,以揭示儿童新闻的写作特点和内容的倾向性,确保内容的教育性和适宜性。本文实验主要评估了ChatGPT3.5等四个模型在将成年人新闻转换为面向儿童的新闻的性能。实验结果表明,TNC-LLM在儿童新闻内容生成任务中对内容的准确性、文本的趣味性以及教育性等关键维度有显著提升。此外,该框架具有普适性,能够应用于不同类型的大型语言模型。”

pdf bib abs

Generate-then-Revise: An Effective Synthetic Training Data Generation Framework For Event Detection Retrieval
Huidong Du | Hao Sun | Pengyuan Liu | Dong Yu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“Large language models (LLMs) struggle with event detection (ED) due to the structured and vari-able number of events in the output. Existing supervised approaches rely on a large amount ofmanually annotated corpora, facing challenges in practice when event types are diverse and theannotated data is scarce. We propose Generate-then-Revise (GtR), a framework that leveragesLLMs in the opposite direction to address these challenges in ED. GtR utilizes an LLM to gen-erate high-quality training data in three stages, including a novel data revision step to minimizenoise in the synthetic data. The generated data is then used to train a smaller model for evalua-tion. Our approach demonstrates significant improvements on the low-resource ED. We furtheranalyze the generated data, highlighting the potential of synthetic data generation for enhancingED performance.Introduction”

pdf bib abs

Enhancing Free-Form Table Question Answering Models by Distilling Relevant-Cell-Based Rationales
Zhiyu Yang | Shuo Wang | Yukun Yan | Pengyuan Liu | Dong Yu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“Free-form table question answering is a challenging task since tables contain structured contentscompared to plain texts, which requires high-level reasoning abilities to effectively identify cellsthat are relevant to the question and produce a correct and faithful answer based on their relations.Large language models (LLMs) have exhibited remarkable reasoning capabilities in numerousNLP applications. However, in some specific tasks, specially-trained small models can still out-perform LLMs. Furthermore, small models require extremely less computation costs comparedto LLMs. To leverage the strengths of both types of models, we propose a Relevant-Cell-basedKnowledge Distillation with inference-time Teacher Guidance (RCKD-TG) method. This ap-proach aims to combine small free-form table question answering models’ abilities to learn fromhuman annotations and large language models’ abilities to effectively reason from table contents,via applying Relevant-Cell-based rationales distilled from LLMs to small models’ training andinference stages. Our experiments demonstrate the superiority of our method over vanilla smallmodels in correctness, faithfulness, adequacy and fluency, also over general LLMs in adheringto the style of human annotations. We achieve state-of-the-art performance on FeTaQA, a rep-resentative free-form table question answering benchmark. Our result of a 41.3 BLEU scoredemonstrates the feasibility of effectively using small models’ task-specific abilities and LLMs’reasoning capabilities at the same time. Additionally, our method exhibits high computation ef-ficiency and data efficiency. Compared to strong baselines, we achieve better performance withsignificantly less training data.”

pdf bib abs

Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns. Despite its importance, the use of Large Language Models (LLMs) for scientific data visualization remains rather unexplored. In this study, we introduce MatPlotAgent, an efficient model-agnostic LLM agent framework designed to automate scientific data visualization tasks. Leveraging the capabilities of both code LLMs and multi-modal LLMs, MatPlotAgent consists of three core modules: query understanding, code generation with iterative debugging, and a visual feedback mechanism for error correction. To address the lack of benchmarks in this field, we present MatPlotBench, a high-quality benchmark consisting of 100 human-verified test cases. Additionally, we introduce a scoring approach that utilizes GPT-4V for automatic evaluation. Experimental results demonstrate that MatPlotAgent can improve the performance of various LLMs, including both commercial and open-source models. Furthermore, the proposed evaluation method shows a strong correlation with human-annotated scores.

pdf bib abs

人类思维指导下大小模型协同决策的中文修辞识别与理解方法
Wen Wang (王雯) | Siyi Tang (汤思怡) | Dong Yu (于东) | Pengyuan Liu (刘鹏远)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“CCL24-Eval任务6提出了一个多层次、细粒度中小学作文修辞识别与理解任务。针对任务特点,本文提出了人类思维指导下大小模型协同决策的中文修辞识别与理解方法。该方法根据人类在面对修辞识别和理解任务时的处理思路,将任务顺序重新定义,并分别选取大小语言模型,使每个步骤的实现效果均达到局部最优,以局部最优达到整体任务的最优效果。结果表明,本文提出的方法能够有效对修辞进行识别与理解,在三个赛道上相较于Baseline方法分别提升了13.54、4.03、57.11。”

pdf bib abs

中西谚语多元价值观资源库建设及对比研究(The construction and comparative study of the resource library of Chinese and Western proverbs and multiple values)
Xia Du (杜霞) | Pengyuan Liu (刘鹏远) | Dong Yu (于东)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“中西方谚语是中西方文化的结晶,分别蕴含着中西方文化中最基本的价值观。但目前缺乏中西方谚语价值观资源,难以对谚语所体现的中西方价值观进行全面的研究,特别是定量对比研究。因此本文设计了多元价值观体系,包含动机及需求、共同及特色价值观、价值判断和使用场景,根据这个体系构建了中西方谚语多元价值观资源库并进行了考察与对比分析。本文发现中西谚语在价值判断、使用场景及部分价值观上具有相似性,在具体内涵表达上各具独特性。”

pdf bib abs

Bridging the Gap between Authentic and Answer-Guided Images for Chinese Vision-Language Understanding Enhancement
Feiyu Wang | Wenyu Guo | Dong Yu | Chen Kang | Pengyuan Liu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“The objective of the Chinese Vision-Language Understanding Evaluation (CVLUE) is to comprehensively assess the performance of Chinese vision-language multimodal pre-trained models in multimodal modeling and understanding across four tasks: Image-Text Retrieval, Visual Question Answering, Visual Grounding, and Visual Dialog. To enhance the models’ performance across various multimodal tasks, this paper propose a multimodal information understanding enhancement method based on answer-guided images. Firstly, we propose task-specific methods for answer-guided image generation. Secondly, the authentic and answer-guided images are fed into the model for multimodal fine-tuning, respectively. Finally, training objectives are set for different tasks to minimize the gap between the answer-guided images and authentic images, thereby supervising the results produced by the authentic images utlizing answer-guided images. The experimental results demonstrate the effectiveness of the proposed method.”

pdf bib abs

基于关系抽取的中文意合图语义解析方法研究
Hongying Huo (霍虹颖) | Shaoping Huang (黄少平) | Pengyuan Liu (刘鹏远)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“意合图是以事件为中心的单根有向语义表征图,在语义计算与应用方面具有重要价值。在乃乃乌中串丰串临中文意合图语义解析评测任务中,为克服意合图为单根有向图、意合图包含隐性事件词以及意合图的语义关系类型十分丰富,导致关系类型过多等诸多方面的难点,本文提出一种将该任务转换为关系抽取的方法。该方法首先对标签进行扩充,分为正向标签和反向标签;其次,对输入进行扩充,将隐性事件词添加到输入中,无须额外对隐性事词进行预测;最后,细分为不带隐性事件词和带隐性事件词的关系抽取任务。实验结果表明,本文方法在官方盲测集上的F1值为64.44%,高出基线模型33.41%,证明了本文方法的有效性。”

pdf bib abs

基于领域信息分解式学习的大语言模型修辞认知增强方法(Method for Enhancing Rhetorical Cognition of Large Language Models Based on Decomposed Learning of Field Information)
Wen Wang (王雯) | Dong Yu (于东) | Pengyuan Liu (刘鹏远)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“中文修辞手法多样且概念差异性大,大语言模型对部分修辞手法的认知存在缺陷。针对该问题,本文研究如何增强大语言模型的修辞认知能力,并探究其与修辞识别性能之间的关系。为此,本文提出了QAKAG框架,此框架首先引入信息分解式学习思想,通过问答形式检测大语言模型的修辞认知缺陷,然后以四种不同的知识组合方式探究最优信息补充机制,实现了大语言模型修辞认知能力的增强。本文构建了多类别中文修辞句数据集MCRSD和修辞知识库MCRKB,并在ChatGPT4等六个大语言模型上开展实验研究,验证了QAKAG框架对增强大语言模型修辞认知能力的有效性以及其各阶段的必要性。结果表明,在QAKAG框架的增强下,六个大语言模型在多类别修辞识别任务上的性能相较直接回答识别问题的平均F1值提高22.1%,优于Zero-shot-CoT、RAG-BaiKe、Few-Shot5提示策略。”

pdf bib abs

大语言模型开放性生成文本中的职业性别偏见研究(Generated by Large Language Models)
Xu Zhang (张旭) | Mengqing Guo (郭梦清) | Shucheng Zhu (朱述承) | Dong Yu (于东) | Ying Liu (刘颖) | Pengyuan Liu (刘鹏远)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“大语言模型问世以来,在自然语言处理诸多任务上都取得了惊人的表现。但其中可能存在的安全性和公平性问题也引起了人们的重视,特别是模型生成文本可能含有对特定职业、性别等群体的偏见和歧视。本文通过两种性别表征形式,构造了显性和隐性的”性别+职业“提示语,提示大语言模型生成开放性文本,并从情感极性、词汇丰富度和冒犯性程度三个维度对生成文本的偏见进行分析,评估并比较了传统模型与以ChatGPT为代表的大语言模型中的职业显性性别和隐性性别交叉偏见。结果表明,比起单维度的职业、性别身份信息,更复杂的职业性别交叉身份信息会减少ChatGPT生成文本中的偏见,具体表现为情感极性趋于中性,词汇丰富度提高;ChatGPT对于不同类型的职业性别身份展现出差异的态度,对研究型、艺术型等创造类的职业情感极性更高,对事务型、经管型等与人打交道的职业情感极性偏低;另外,ChatGPT相比之前的GPT-2模型在生成能力和消除偏见上有所进步,在多种组合身份提示下的生成文本更加积极、多样,冒犯性内容显著减少。”

pdf bib abs

Proper moral beliefs are fundamental for language models, yet assessing these beliefs poses a significant challenge. This study introduces a novel three-module framework to evaluate the moral beliefs of four prominent large language models. Initially, we constructed a dataset containing 472 moral choice scenarios in Chinese, derived from moral words. The decision-making process of the models in these scenarios reveals their moral principle preferences. By ranking these moral choices, we discern the varying moral beliefs held by different language models. Additionally, through moral debates, we investigate the firmness of these models to their moral choices. Our findings indicate that English language models, namely ChatGPT and Gemini, closely mirror moral decisions of the sample of Chinese university students, demonstrating strong adherence to their choices and a preference for individualistic moral beliefs. In contrast, Chinese models such as Ernie and ChatGLM lean towards collectivist moral beliefs, exhibiting ambiguity in their moral choices and debates. This study also uncovers gender bias embedded within the moral beliefs of all examined language models. Our methodology offers an innovative means to assess moral beliefs in both artificial and human intelligence, facilitating a comparison of moral values across different cultures.

pdf bib abs

Do PLMs and Annotators Share the Same Gender Bias? Definition, Dataset, and Framework of Contextualized Gender Bias
Shucheng Zhu | Bingjie Du | Jishun Zhao | Ying Liu | Pengyuan Liu
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Pre-trained language models (PLMs) have achieved success in various of natural language processing (NLP) tasks. However, PLMs also introduce some disquieting safety problems, such as gender bias. Gender bias is an extremely complex issue, because different individuals may hold disparate opinions on whether the same sentence expresses harmful bias, especially those seemingly neutral or positive. This paper first defines the concept of contextualized gender bias (CGB), which makes it easy to measure implicit gender bias in both PLMs and annotators. We then construct CGBDataset, which contains 20k natural sentences with gendered words, from Chinese news. Similar to the task of masked language models, gendered words are masked for PLMs and annotators to judge whether a male word or a female word is more suitable. Then, we introduce CGBFrame to measure the gender bias of annotators. By comparing the results measured by PLMs and annotators, we find that though there are differences on the choices made by PLMs and annotators, they show significant consistency in general.

2023

pdf bib abs

中国社会道德变化模型与发展动因探究——基于70年《人民日报》的计量与分析 (The Model of Moral Change and Motivation in Chinese Society ——The Vocabulary Analysis of the 70-year ”People’s Daily”)
Hongrui Wang (王弘睿) | Dong Yu (于东) | Pengyuan Liu (刘鹏远) | Liying Ceng (曾立英)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“社会道德的历时变迁研究具有重要意义。通过观察语言使用与道德变迁的历时联系,能够帮助描绘社会道德的变化趋势和发展规律、把握社会道德动态、推进道德建设。目前缺少从词汇角度、利用计算手段对大规模历时语料进行系统、全面的社会道德变迁研究。基于此,该文提出道德主题词历时计量模型,通过计量指标对1946-2015共70年的《人民日报》语料进行了历时计算与分析,观察了70年社会道德主题词的使用选择与变化。研究结果发现,道德词汇的历时使用与社会道德之间存在互动关系,反映出70年中国社会道德的历时变革与发展情况。”

pdf bib abs

大规模语言模型增强的中文篇章多维度阅读体验量化研究(Quantitative Research on Multi-dimensional Reading Experience of Chinese Texts Enhanced by Large Language Model)
Jiadai Sun (孙嘉黛) | Siyi Tang (汤思怡) | Shike Wang (王诗可) | Dong Yu (于东) | Pengyuan Liu (刘鹏远)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“现有的文本分级阅读研究往往从文本可读性的角度出发,以离散的文本难度等级的形式为读者推荐阅读书目。目前,仍缺少一种研究读者在阅读过程中产生的多方面、深层次阅读体验的体系结构。对此,我们调研了读者在阅读中文篇章过程中产生的不同阅读体验,提出了中文篇章多维度阅读体验的量化体系。我们将阅读过程中呈现的连续性的阅读体验归纳为多种类别,并在此基础上构建了中文篇章多维度阅读体验数据集。同时,我们探究了以大规模语言模型为基础的ChatGPT对阅读体验的量化能力,发现其虽具备强大的信息抽取和语义理解能力,在阅读体验的量化上却表现不佳。但我们发现大规模语言模型所蕴含的能力能够以知识蒸馏的方式协助深层属性的量化,基于此,我们实现了大规模语言模型增强的中文篇章多维阅读体验量化模型。模型在各维度阅读体验上的平均F1值达到0.72,高于ChatGPT的Fewshot结果0.48。”

pdf bib abs

动词视角下的汉语性别表征研究——基于多语体语料库与依存分析(Gendered Representation in Chinese via Verbal Analysis —Based on a Multi-register Corpus and Dependency Parsing)
Yingshi Chen (陈颖诗) | Dong Yu (于东) | Pengyuan Liu (刘鹏远)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“动作是反映性别社会化的重要形式,研究汉语中动词的性别表征,可以找到语言构建不同性别身份的路径,即所采用的方式、形式。本文以依存句法关系为抓手,在四种语体的语料中抽取出和不同性别词构成依存结构的动词,统计出有显著性别差异的动词,并根据性别词充当的句子成分,结合语义进行了定量和定性分析。总体来看,大部分汉语动词表征是中性的,能体现性别的动词是少数,汉语作为一种承载着中华智慧且具有深厚文化底蕴的语言,对性别的表征是中立且平等的,这也体现出了我国的性别平等观念。而在表征性别的动词中,能看到构建男性和女性身份的两种不同路径。显著表征女性的动词在不同语体的语料中均多于显著表征男性的,但是表征男性的动词的语义分布则更为均衡,体现了“男性默认-女性专门”。在司法动词上,女性常常作为暴力行为的受害者,同时施害者男性却隐身了,体现了筜男性主宰笭女性顺从笢。不同语体的动词在构建性别时体现了不同的功能,新闻塑造了较为传统的性别规范,传统和网络文学以不同的形式打破了固有的性别规范。”

2022

pdf bib abs

Analysis of Gender Bias in Social Perception and Judgement Using Chinese Word Embeddings
Jiali Li | Shucheng Zhu | Ying Liu | Pengyuan Liu
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Gender is a construction in line with social perception and judgment. An important means of this construction is through languages. When natural language processing tools, such as word embeddings, associate gender with the relevant categories of social perception and judgment, it is likely to cause bias and harm to those groups that do not conform to the mainstream social perception and judgment. Using 12,251 Chinese word embeddings as intermedium, this paper studies the relationship between social perception and judgment categories and gender. The results reveal that these grammatical gender-neutral Chinese word embeddings show a certain gender bias, which is consistent with the mainstream society’s perception and judgment of gender. Men are judged by their actions and perceived as bad, easily-disgusted, bad-tempered and rational roles while women are judged by their appearances and perceived as perfect, either happy or sad, and emotional roles.

pdf bib abs

中文自然语言处理多任务中的职业性别偏见测量(Measurement of Occupational Gender Bias in Chinese Natural Language Processing Tasks)
Mengqing Guo (郭梦清) | Jiali Li (李加厉) | Jishun Zhao (赵继舜) | Shucheng Zhu (朱述承) | Ying Liu (刘颖) | Pengyuan Liu (刘鹏远)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“尽管悲观者认为,职场中永远不可能存在性别平等。但随着人们观念的转变,愈来愈多的人们相信,职业的选择应只与个人能力相匹配,而不应由个体的性别决定。目前已经发现自然语言处理的各个任务中都存在着职业性别偏见。但这些研究往往只针对特定的英文任务,缺乏针对中文的、综合多任务的职业性别偏见测量研究。本文基于霍兰德职业模型,从中文自然语言处理中常见的三个任务出发,测量了词向量、共指消解和文本生成中的职业性别偏见,发现不同任务中的职业性别偏见既有一定的共性,又存在着独特的差异性。总体来看,不同任务中的职业性别偏见反映了现实生活中人们对于不同性别所选择职业的刻板印象。此外,在设计不同任务的偏见测量指标时,还需要考虑如语体、词序等语言学要素的影响。”

pdf bib abs

CLGC: A Corpus for Chinese Literary Grace Evaluation
Yi Li | Dong Yu | Pengyuan Liu
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we construct a Chinese literary grace corpus, CLGC, with 10,000 texts and more than 1.85 million tokens. Multi-level annotations are provided for each text in our corpus, including literary grace level, sentence category, and figure-of-speech type. Based on the corpus, we dig deep into the correlation between fine-grained features (semantic information, part-of-speech and figure-of-speech, etc.) and literary grace level. We also propose a new Literary Grace Evaluation (LGE) task, which aims at making a comprehensive assessment of the literary grace level according to the text. In the end, we build some classification models with machine learning algorithms (such as SVM, TextCNN) to prove the effectiveness of our features and corpus for LGE. The results of our preliminary classification experiments have achieved 79.71% on the weighted average F1-score.

pdf bib abs

From Polarity to Intensity: Mining Morality from Semantic Space
Chunxu Zhao | Pengyuan Liu | Dong Yu
Proceedings of the 29th International Conference on Computational Linguistics

Most works on computational morality focus on moral polarity recognition, i.e., distinguishing right from wrong. However, a discrete polarity label is not informative enough to reflect morality as it does not contain any degree or intensity information. Existing approaches to compute moral intensity are limited to word-level measurement and heavily rely on human labelling. In this paper, we propose MoralScore, a weakly-supervised framework that can automatically measure moral intensity from text. It only needs moral polarity labels, which are more robust and easier to acquire. Besides, the framework can capture latent moral information not only from words but also from sentence-level semantics which can provide a more comprehensive measurement. To evaluate the performance of our method, we introduce a set of evaluation metrics and conduct extensive experiments. Results show that our method achieves good performance on both automatic and human evaluations.

pdf bib abs

CoreValue:面向价值观计算的中文核心价值-行为体系及知识库(CoreValue: Chinese Core Value-Behavior Frame and Knowledge Base for Value Computing)
Pengyuan Liu (刘鹏远) | Sanle Zhang (张三乐) | Dong Yu (于东) | Lin Bo (薄琳)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“由主体行为推断其价值观是人工智能理解并具有人类价值观的前提之一。在NLP相关领域,研究主要集中在对文本价值观或道德的是非判断上,鲜见由主体行为推断其价值观的工作,也缺乏相应的数据资源。该文首先构建了中文核心价值-行为体系。该体系以社会主义核心价值观为基础,分为两部分:1)类别体系。共包含8大类核心价值,进一步细分为19小类双方向价值并对应38类行为;2)要素体系。划分为核心与非核心要素共7种。随后,抽取语料中含有主体行为的文本句,依据该体系进行人工标注,构建了一个包含6994个行为句及其对应的细粒度价值与方向,34965个要素的细粒度中文价值-行为知识库。最后,该文提出了价值观类别判别、方向判别及联合判别任务并进行了实验。结果表明,基于预训练语言模型的方法在价值观方向判别上表现优异,在细粒度价值类别判别以及价值类别多标签判别上,有较大提升空间。”

2021

pdf bib abs

中文关系抽取的句级语言学特征探究(A Probe into the Sentence-level Linguistic Features of Chinese Relation Extraction)
Baixi Xing (邢百西) | Jishun Zhao (赵继舜) | Pengyuan Liu (刘鹏远)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

神经网络模型近些年在关系抽取任务上已经展示出了很好的效果,然而我们对于特征提取的过程所知甚少,而这也进一步限制了深度神经网络模型在关系抽取任务上的进一步发展。当前已有研究工作对英文关系抽取的语言学特征进行探究,并且得到了一些规律。然而由于中文与西方语言之间明显的差异性,其所探究到的规律与解释性不适用于中文关系抽取。本文首次对中文关系抽取神经网络进行探究,采用了四个角度共13种探究任务,其中包含中文特有的分词探究任务。在两个关系抽取数据集上进行了实验,探究了中文关系抽取模型进行特征提取的规律。

pdf bib abs

BLCUFIGHT at SemEval-2021 Task 10: Novel Unsupervised Frameworks For Source-Free Domain Adaptation
Weikang Wang | Yi Wu | Yixiang Liu | Pengyuan Liu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Domain adaptation assumes that samples from source and target domains are freely accessible during a training phase. However, such assumption is rarely plausible in the real-world and may causes data-privacy issues, especially when the label of the source domain can be a sensitive attribute as an identifier. SemEval-2021 task 10 focuses on these issues. We participate in the task and propose novel frameworks based on self-training method. In our systems, two different frameworks are designed to solve text classification and sequence labeling. These approaches are tested to be effective which ranks the third among all system in subtask A, and ranks the first among all system in subtask B.

pdf bib

A Comparative Study of Collocation Extraction Methods from the Perspectives of Vocabulary and Grammar: A Case Study in the Field of Journalism
Lulu Gu | Yue Pan | Pengyuan Liu
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf bib abs

中文句子级性别无偏数据集构建及预训练语言模型的性别偏度评估(Construction of Chinese Sentence-Level Gender-Unbiased Data Set and Evaluation of Gender Bias in Pre-Training Language)
Jishun Zhao (赵继舜) | Bingjie Du (杜冰洁) | Shucheng Zhu (朱述承) | Pengyuan Liu (刘鹏远)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

自然语言处理领域各项任务中,模型广泛存在性别偏见。然而当前尚无中文性别偏见评估和消偏的相关数据集,因此无法对中文自然语言处理模型中的性别偏见进行评估。首先本文根据16对性别称谓词,从一个平面媒体语料库中筛选出性别无偏的句子,构建了一个含有20000条语句的中文句子级性别无偏数据集SlguSet。随后,本文提出了一个可衡量预训练语言模型性别偏见程度的指标,并对5种流行的预训练语言模型中的性别偏见进行评估。结果表明,中文预训练语言模型中存在不同程度的性别偏见,该文所构建数据集能够很好的对中文预训练语言模型中的性别偏见进行评估。同时,该数据集还可作为评估预训练语言模型消偏方法的数据集。

2020

pdf bib

Imbalanced Chinese Multi-label Text Classification Based on Alternating Attention
Hongliang Bi | Han Hu | Pengyuan Liu
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib abs

小样本关系分类研究综述(Few-Shot Relation Classification: A Survey)
Han Hu (胡晗) | Pengyuan Liu (刘鹏远)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

关系分类作为构建结构化知识的重要一环,在自然语言处理领域备受关注。但在很多应用领域中(医疗、金融领域),收集充足的用于训练关系分类模型的数据是十分困难的。近年来,仅需要少量训练样本的小样本学习研究逐渐新兴于各大领域。本文对近期小样本关系分类模型与方法进行了系统的综述。根据度量方法的不同,将现有方法分为原型式和分布式两大类。根据是否利用额外信息,将模型分为预训练和非预训练两大类。此外,除了常规设定下的小样本学习,本文还梳理了跨领域和稀缺资源场景下的小样本学习,并探讨了目前小样本关系分类方法的局限性,分析了跨领域小样本学习面临的技术挑战。最后,展望了小样本关系分类未来的发展方向。

pdf bib abs

多目标情感分类中文数据集构建及分析研究(Construction and Analysis of Chinese Multi-Target Sentiment Classification Dataset)
Pengyuan Liu (刘鹏远) | Yongsheng Tian (田永胜) | Chengyu Du (杜成玉) | Likun Qiu (邱立坤)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

目标级情感分类任务是要得到句子中特定评价目标的情感倾向。一个评论句中往往存在多个目标,多个目标的情感可能一致,也可能不一致。但在已有针对目标级情感分类的评测数据集中:1)大多数是一个句子一个目标;2)在少数有多个目标的句子中,多个目标情感倾向分布很不均衡,多个目标情感一致的情形占较大优势。数据集本身的缺陷限制了模型针对多个目标进行情感分类的提升空间。针对以上问题,本文构建了一个针对多目标情感分类的中文数据集,人工标注了6339个评价目标,共2071条数据。该数据集:1)评价目标个数分布平衡;2)情感正负极性分布平衡;3)多目标情感倾向分布平衡。随后,本文利用多个目标情感分类的主流模型在该数据集上进行了实验与比较分析。结果表明现有主流模型尚不能对存在多个目标且目标情感倾向性不一致实例中的目标进行很好的分类,尤其是目标的情感倾向为中性时。多目标情感分类任务具有一定的难度与挑战性。

pdf bib abs

基于计量的百年中国人名用字性别特征研究(A Quantified Research on Gender Characteristics of Chinese Names in A Century)
Bingjie Du (杜冰洁) | Pengyuan Liu (刘鹏远) | Yongsheng Tian (田永胜)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

本文构建了一个包含11万以上条目规模的中国名人人名数据库,每条数据含有人名、性别、出生地等社会文化标签,同时含有拼音、笔画、偏旁等文字信息标签,这是目前已知最大的可用于研究的汉语真人人名数据库。基于该数据库,本文从中选择1919年至今的人名,用定性与定量结合的方法探究人名中汉字的特征和其性别差异以及历时变化。从人名长度来看,男性人名比女性人名长;从人名用字的难易度来看,女性用字比男性更复杂;从用字丰富度来看,人名用字越来越单一和集中化,男性人名的用字丰富度大于女性人名。计算人名用字的性别偏度后发现女性人名的专用自更多。两性用字意象有明显的不同,用字的意象随着时间发生改变,但改变最明显的时间节点是改革开放前后,其中女性的变化比男性显著。除此之外,我们还得出人名中的性别极性字表、各个阶段的高频字表、用字变化趋势表等。

pdf bib abs

基于语料库的武侠与仙侠网络小说文体、词汇及主题对比分析(A Corpus-based Contrastive Analysis of Style, Vocabulary and Theme of Wuxia and Xianxia Internet Novels)
Sanle Zhang (张三乐) | Pengyuan Liu (刘鹏远) | Hu Zhang (张虎)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

网络文学在我国发展迅猛,其数量和影响力呈现逐年上升的趋势,但目前尚无公开的较大规模网络文学作品语料库,鲜见基于语料库对网络文学具体类别作品的定量研究。本文初步建立了一个网络文学语料库,其中包括武侠和仙侠网络小说,使用文本计量、词频统计以及主题挖掘的方法对两类小说的文体风格、具体词汇使用和小说主题进行对比分析。通过比较,我们发现两类小说的文体风格大致相同,它们在词汇的使用和主题上既有共性又各具特色。从微观到宏观,从表面到内容,将定量统计和定性分析相结合,多角度、多层次的对武侠和仙侠网络小说进行比较。

pdf bib abs

CDCPP:跨领域中文标点符号预测(CDCPP: Cross-Domain Chinese Punctuation Prediction)
Pengyuan Liu (刘鹏远) | Weikang Wang (王伟康) | Likun Qiu (邱立坤) | Bingjie Du (杜冰洁)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

标点符号对文本理解起很大作用。但目前,在中文文本特别是在社交媒体及问答领域文本中的标点符号使用存在非常多的错误或缺失的情况,这严重影响对其进行语义分析及机器翻译等各项自然语言处理的效果。当前对标点符号进行预测的相关研究多集中于英文对话的语音转写文本,缺少对社交媒体及问答领域文本进行标点预测的相关研究,也没有这些领域公开的数据集。本文首先提出跨领域中文标点符号预测任务,该任务是要利用标点符号基本规范正确的大规模新闻领域文本,建立标点符号预测模型,然后在标点符号标注不规范的社交媒体及问答领域,进行跨领域标点符号预测。随后构建了新闻、社交媒体及问答三个领域的相应数据集。最后还实现了一个基于BERT的标点符号预测基线模型,并在该数据集上进行了实验与分析。实验结果表明,直接利用新闻领域训练的模型,在社交媒体及问答领域上进行标点符号预测的性能均有所下降,在问答领域下降较小,在微博领域下降较大,超过20%,跨领域标点符号预测任务具有一定的挑战性。

pdf bib

Sensorimotor Enhanced Neural Network for Metaphor Detection
Mingyu Wan | Baixi Xing | Qi Su | Pengyuan Liu | Chu-Ren Huang
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib abs

伟大的男人和倔强的女人:基于语料库的形容词性别偏度历时研究(Great Males and Stubborn Females: A Diachronic Study of Corpus-Based Gendered Skewness in Chinese Adjectives)
Shucheng Zhu (朱述承) | Pengyuan Liu (刘鹏远)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

性别偏见现象是社会语言学和计算语学学者均关注的研究热点,但目前大多数研究都是基于英语的,鲜有对汉语中性别偏见现象,特别是基于形容词的研究缺乏。而形容词是衡量社会对男性和女性角色规约的有力抓手。本文首先利用调查问卷的方法,构建了一个含有466个形容词的数据集,定义性别偏度为特定形容词词义和男性或女性群体相匹配的程度,并计算了数据集中每个形容词的性别偏度。然后基于DCC语料库,研究了《人民日报》的形容词性别偏度的历时总体变化,并考察了和姓名搭配的形容词的历时变化。发现《人民日报》所使用的形容词随时间的推移整体呈现中性化趋势,但在文化大革命期间呈现非常男性化的特征,和男性姓名搭配的形容词整体呈现中性化趋势。

2015

pdf bib

A Corpus-based Comparatively Study on the Semantic Features and Syntactic patterns of Yòu/Hái in Mandarin Chinese
Yuncui Zhang | Pengyuan Liu
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

2010

pdf bib

PKU_HIT: An Event Detection System Based on Instances Expansion and Rich Syntactic Features
Shiqi Li | Pengyuan Liu | Tiejun Zhao | Qin Lu | Hanjing Li
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib

PengYuan@PKU: Extracting Infrequent Sense Instance with the Same N-Gram Pattern for the SemEval-2010 Task 15
Peng-Yuan Liu | Shi-Wen Yu | Shui Liu | Tie-Jun Zhao
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib

Combining Constituent and Dependency Syntactic Views for Chinese Semantic Role Labeling
Shiqi Li | Qin Lu | Tiejun Zhao | Pengyuan Liu | Hanjing Li
Coling 2010: Posters

pdf bib

Head-modifier Relation based Non-lexical Reordering Model for Phrase-Based Translation
Shui Liu | Sheng Li | Tiejun Zhao | Min Zhang | Pengyuan Liu
Coling 2010: Posters

Pengyuan Liu

2025

2024

2023

2022

2021

2020

2015

2010

2007

Co-authors

Venues