Juncai Li
Also published as: 俊材 李
2025
面向对话场景的构式数据集
Xujing Xue | Juncai Li | Xuefeng Su | Peiyuan Yang | Qinghua Chai | Ru Li
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Xujing Xue | Juncai Li | Xuefeng Su | Peiyuan Yang | Qinghua Chai | Ru Li
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"大语言模型在多种自然语言处理任务中展现出强大的语义理解能力。现有研究通常基于各类语义解析数据集对大语言模型进行评估,然而,这些数据集难以覆盖对话语料中常见的口语化表达与特定结构表达语义的语言现象,无法有效评估大语言模型在对话场景中的细粒度语义理解能力。为此,本文面向对话语料构建了一个包含2146条语句、1748个构式的中文构式数据集,实现语义信息细粒度表达的同时有效覆盖了现有语义解析评估数据集的缺口。基于该数据集,本文选取了其中部分代表性构式,结合框架语义学理论,提出了构式识别与构式语义理解两项评测任务,以系统评估大语言模型在对话场景中识别构式与理解深层语义的能力。实验结果表明,当前大语言模型在构式识别方面仍存在明显不足;且在缺乏思维链推理的引导下,难以理解构式所承载的深层语义。"
Overview of CCL25-Eval Task2: Chinese Frame Semantic Parsing Evaluation
Hao Xu | Juncai Li | Zhichao Yan | Haikun Liu | Xuefeng Su | ZhangJiaYang ZhangJiaYang | Ru Li
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Hao Xu | Juncai Li | Zhichao Yan | Haikun Liu | Xuefeng Su | ZhangJiaYang ZhangJiaYang | Ru Li
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"Chinese Frame Semantic Parsing (CFSP) aims to extract fine-grained frame semantic structures from text, providing rich semantic information to enhance the capabilities of natural language understanding models in semantic representation and downstream applications. Building on the CCL-2024 CFSP evaluation task and motivated by the prevalent phenomenon of semantic roles nesting in sentences, we update the nested role annotation data by simultaneously labeling all nested semantic roles. Based on this enhancement, we publish a more challenging CFSP evaluation task for CCL-2025. The evaluation dataset consists of 22,000 annotated examples involving 703 frames, including nested annotations covering 101 semantic roles. The evaluation task, divided into three subtasks: frame identification, argument identification, and role identification, has attracted wide attention from both industry and academia, with a total of 156 teams participating. As for the evaluation results, Yongqing Huang from Guangdong Province won first place with a final score of 70.76.In this paper, we report key information about the evaluation task, including key concepts, evaluation dataset, top-3 results and corresponding methods. More information about this task can be found on the website for the CCL-2025 CFSP evaluation task."
2024
Chinese Frame Semantic Parsing Evaluation
Peiyuan Yang | Juncai Li | Zhichao Yan | Xuefeng Su | Li Ru
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
Peiyuan Yang | Juncai Li | Zhichao Yan | Xuefeng Su | Li Ru
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“Chinese Frame-semantic Parsing (CFSP) aims to extract fine-grained frame-semantic structures from texts, which can provide fine-grained semantic information for natural language understanding models to enhance their abilities of semantic representations. Based on the CCL-23 CFSP evaluation task, we introduce construction grammar to expand the targets, as basic units activating frames in texts, from word-style to construction-style, and publish a more challenging CFSP evaluation task in CCL-2024. The evaluation dataset consists of 22,000 annotated examples involving nearly 695 frames. The evaluation task is divided into three subtasks: frame identification, argument identification, and role identification, involving two tracks: close track and open track. The evaluation task has attracted wide attention from both industry and academia, with a total of 1988 participating teams. As for the evaluation results, the team from China University of Petroleum won the first place in the closed track with the final score of 71.34, while the team frome Suzhou University won the first place in the open track with the final socre of 48.77. In this article, we reports the key information about the evaluation task, including key concepts, evaluation dataset, top-3 results and corresponding methods. More information about this task can be found on the website of the CCL-2024 CFSP evaluation task.”
面向“以A为B”构式语义场景的汉语框架识别数据集构建⋆(Dataset for Recognizing Chinese Semantic Frames based on the Semantic Scenario of the “Yi A Wei B” Construction)
Peiyuan Yang (杨沛渊) | Xuefeng Su (苏雪峰) | Juncai Li (李俊材) | Zhichao Yan (闫智超) | Qinghua Chai (柴清华) | Ru Li (李茹)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Peiyuan Yang (杨沛渊) | Xuefeng Su (苏雪峰) | Juncai Li (李俊材) | Zhichao Yan (闫智超) | Qinghua Chai (柴清华) | Ru Li (李茹)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“汉语中普遍存在一些语义场景,其语义核心不是以单个词语呈现,而是通过句子中的某个特定结构来表达。然而当前公开发表的数据集中,只有极少数的数据集将这种特定结构作为语义单元进行研究。汉语框架语义知识库是进行汉语深层语义分析与推理的优质资源,目前其激活框架的基本单位均为句中的一个词。本文以汉语框架语义知识库为基础,引入构式语法,使用2020《人民日报》语料库,以“以A为B”构式为例,建立了基于“以A为B”构式的汉语框架识别数据集,包含23849条例句,相应框架141个。本文使用多个汉语框架识别模型及大语言模型在该数据集上进行了实验,并针对传统框架识别模型在以构式为目标词的框架识别任务中由于目标词信息匮乏导致的识别困难问题,提出了基于目标词转化和数据增强的两种方法,使模型准确率达到了88.19%,有效提升了模型挖掘构式蕴含的深层语义信息的能力。”
Inference Helps PLMs’ Conceptual Understanding: Improving the Abstract Inference Ability with Hierarchical Conceptual Entailment Graphs
Juncai Li | Ru Li | Xiaoli Li | Qinghua Chai | Jeff Z. Pan
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Juncai Li | Ru Li | Xiaoli Li | Qinghua Chai | Jeff Z. Pan
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The abstract inference capability of the Language Model plays a pivotal role in boosting its generalization and reasoning prowess in Natural Language Inference (NLI). Entailment graphs are crafted precisely for this purpose, focusing on learning entailment relations among predicates. Yet, prevailing approaches overlook the *polysemy* and *hierarchical nature of concepts* during entity conceptualization. This oversight disregards how arguments might entail differently across various concept levels, thereby missing potential entailment connections. To tackle this hurdle, we introduce the *concept pyramid* and propose the HiCon-EG (Hierarchical Conceptual Entailment Graph) framework, which organizes arguments hierarchically, delving into entailment relations at diverse concept levels. By learning entailment relationships at different concept levels, the model is guided to better understand concepts so as to improve its abstract inference capabilities. Our method enhances scalability and efficiency in acquiring common-sense knowledge through leveraging statistical language distribution instead of manual labeling, Experimental results show that entailment relations derived from HiCon-EG significantly bolster abstract detection tasks. Our code is available at https://github.com/SXUCFN/HiCon-EG
2023
CCL23-Eval 任务3总结报告:汉语框架语义解析评测(Overview of CCL23-Eval Task 1:Chinese FrameNet Semantic Parsing)
Juncai Li (李俊材) | Zhichao Yan (闫智超) | Xuefeng Su (苏雪峰) | Boxiang Ma (马博翔) | Peiyuan Yang1 (杨沛渊) | Ru Li (李茹)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
Juncai Li (李俊材) | Zhichao Yan (闫智超) | Xuefeng Su (苏雪峰) | Boxiang Ma (马博翔) | Peiyuan Yang1 (杨沛渊) | Ru Li (李茹)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“汉语框架语义解析评测任务致力于提升机器模型理解细粒度语义信息的能力。该评测数据集包括20000条标注的框架语义解析例句和近700个框架信息。评测任务分为框架识别、论元范围识别和论元角色识别三个子任务,最终成绩根据这三个任务的得分综合计算。本次评测受到工业界和学术界的广泛关注,共有55支队伍报名参赛,其中12支队伍提交了结果,我们选取5支队伍的模型进行结果复现,最终来自四川的李作恒以71.49的分数排名第一。该任务的更多信息,包括系统提交、评测结果以及数据资源,可从CCL-2023汉语框架语义解析评测任务网址1查看。”
基于框架语义场景图的零形式填充方法(A Null Instantiation Filling Method based Frame Semantic Scenario Graph)
Yuzhi Wang (王俞智) | Ru Li (李茹) | Xuefeng Su (苏雪峰) | Zhichao Yan (闫智超) | Juncai Li (李俊材)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
Yuzhi Wang (王俞智) | Ru Li (李茹) | Xuefeng Su (苏雪峰) | Zhichao Yan (闫智超) | Juncai Li (李俊材)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
“零形式填充是在篇章上下文中为给定句子中的隐式框架语义角色找到相应的填充内容。传统的零形式填充方法采用pipeline模型,容易造成错误传播,并且忽略了显式语义角色及其填充内容的重要性。针对上述问题,本文提出了一种端到端的零形式填充方法,该方法结合汉语框架网信息构建出框架语义场景图并利用GAT对其建模,得到融合了显式框架元素信息的候选填充项表示,增强了模型对句中隐式语义成分的识别能力。在汉语零形式填充数据集上的实验表明,本文提出的模型相较于基于Bert的基线模型F1值提升了9.16%,证明了本文提出方法的有效性。”