Tan Hongye

Also published as: 红叶


2024

pdf bib
大语言模型故事理解能力评价数据集(Benchmarking story comprehension ability of large language model)
Yan Guohang (闫国航) | Guo Yaxin (郭亚鑫) | Tan Hongye (谭红叶) | Zhang Hu (张虎)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“故事包含大量的社会、物理等常识,同时蕴含深刻的道理,是知识传播、文化传承、价值塑造的重要载体。故事理解是NLP中的一项重要任务。近几年,研究者对大语言模型(LLMs)的语言理解能力进行了很多评估与分析,但由于现有的故事理解数据集大多为答案出现在原文的实体类问题,因此对LLMs故事理解能力的评价与分析非常有限。为此,本文构建了一个寓言故事理解数据集CRMUS,并基于人类故事理解的认知过程:先进行常识推理,然后理解故事寓意,设计了两个任务来评价模型的相应能力。基于CSMUS数据集,我们对多个代表性的LLMs进行了评估,发现:LLMs已经可以较好地理解故事中的常识并进行推理,但在理解故事寓意方面还存在很大提升空间。此外,我们使用项目反应理论(IRT)对数据集进行了质量分析,表明该数据集是高质量的,可以有效地评估LLMs。”

pdf bib
Evaluation of Commonsense Reasoning and Moral Understanding in Children’s Stories
Yan Guohang | Liang Feihao | Guo Yaxin | Tan Hongye | Li Ru | Zhang Hu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“This paper provides a comprehensive review of the the CCL24-Eval Task 8: Commonsense Reasoning and Moral Understanding in Children’s Stories(CRMUS). This task has designed two sub-tasks, which aim to assess the commonsense reasoning and implicit meaning comprehension capabilities of Large Language Models(LLMs). We heve received registration forms from 33 teams, 15 of which submitted final results that exceeded the baseline score. We present the results of the top 5 teams and our analysis of these results.”