Liming Xiao
Also published as: 力铭 肖
2025
Overview of CCL25-Eval Task 1: The Fifth Spatial Cognition Evaluation (SpaCE2025)
Yuhang Qin | Liming Xiao | Nan Hu | Sirui Deng | Jingyuan Ma | Hyang Cui | Zihan Zhang | Chi Hsu Tsai | Jinkun Ding | Sumin Kang | Zhifang Sui | Weidong Zhan
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Yuhang Qin | Liming Xiao | Nan Hu | Sirui Deng | Jingyuan Ma | Hyang Cui | Zihan Zhang | Chi Hsu Tsai | Jinkun Ding | Sumin Kang | Zhifang Sui | Weidong Zhan
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"The Fifth Spatial Cognition Evaluation (SpaCE2025) presents a benchmark aimed at evaluating the spatial semantic understanding and reasoning capabilities of Large Language Models(LLMs), primarily in Chinese.It consists of five subtasks: (1) Retrieving Spatial Referents(RSR), (2) Detecting Spatial Semantic Anomalies (DSA), (3) Recognizing Synonymous SpatialExpression (RSE), (4) Spatial Position Reasoning (SPR) in Chinese, and (5) SPR in English. The fourth and fifth subtask share the same content and structure, differing only in language, and are designed to assess the cross-linguistic spatial reasoning capability of LLMs. A total of 12 teams submitted their final results, and the best-performing team achieved an accuracy of 0.7931. The results suggest that while LLMs are capable of handling basic spatial semantic understanding tasks such as RSR, their performance on more complex tasks, such as DSA and RSE, still re-quires improvement. Additionally, finetuning methods that effectively activate LLMs’ reasoning ability are essential to improve their performance."
2024
The Fourth Evaluation on Chinese Spatial Cognition
Liming Xiao | Nan Hu | Weidong Zhan | Yuhang Qin | Sirui Deng | Chunhui Sun | Qixu Cai | Nan Li
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
Liming Xiao | Nan Hu | Weidong Zhan | Yuhang Qin | Sirui Deng | Chunhui Sun | Qixu Cai | Nan Li
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“The Fourth Chinese Spatial Cognition Evaluation Task (SpaCE 2024) presents the first comprehensive Chinese benchmark to assess spatial semantic understanding and reasoning capabilities of Large Language Models (LLMs). It comprises five subtasks in the form of multiple-choice questions: (1) identifying spatial semantic roles; (2) retrieving spatial referents; (3) detecting spatial semantic anomalies; (4) recognizing synonymous spatial expression with different forms; (5) conducting spatial position reasoning. In addition to proposing new tasks, SpaCE 2024 applied a rule-based method to generate high-quality synthetic data with difficulty levels for the reasoning task. 12 teams submitted their models and results, and the top-performing team attained an accuracy of 60.24%, suggesting that there is still significant room for current LLMs to improve, especially in tasks requiring high spatial cognitive processing.”
2023
SpaCE2022中文空间语义理解评测任务数据集分析报告(A Quality Assessment Report of the Chinese Spatial Cognition Evaluation Benchmark)
Liming Xiao (力铭 肖) | Chunhui Sun (春晖 孙) | Weidong Zhan (卫东 詹) | Dan Xing (丹 邢) | Nan Li (楠 李) | Chengwen Wang (王诚文) | Fangwei Zhu (方韦 祝)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
Liming Xiao (力铭 肖) | Chunhui Sun (春晖 孙) | Weidong Zhan (卫东 詹) | Dan Xing (丹 邢) | Nan Li (楠 李) | Chengwen Wang (王诚文) | Fangwei Zhu (方韦 祝)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
“第二届中文空间语义理解评测任务(SpaCE2022)旨在测试机器的空间语义理解能力,包括三个子任务:(1)中文空间语义正误判断任务;(2)中文空间语义异常归因与异常文本识别任务;(3)中文空间实体识别与空间方位关系标注任务。本文围绕SpaCE2022数据集介绍了标注规范和数据集制作流程,总结了改善数据集质量的方法,包括构建STEP标注体系,规范描述空间语义信息;基于语言学知识生成空间异常句子,提高数据多样性;采取双人标注、基于规则的实时质检、人工抽样审核等方式加强数据质量控制;分级管理标注数据,优选高质量数据进入数据集。通过考察数据集分布情况以及机器表现和人类表现,本文发现SpaCE2022数据集的标签分布存在明显偏差,而且正误判断任务和异常归因任务的主观性强,一致性低,这些问题有待在将来的SpaCE任务设计中做进一步优化。”
CCL23-Eval任务4总结报告:第三届中文空间语义理解评测(Overview of CCL23-Eval Task 4:The 3rd Chinese Spatial Cognition Evaluation)
Liming Xiao (力铭 肖) | Weidong Zhan (卫东 詹) | Zhifang Sui (穗志方) | Yuhang Qin (秦宇航) | Chunhui Sun (春晖 孙) | Dan Xing (丹 邢) | Nan Li (楠 李) | Fangwei Zhu (方韦 祝) | Peiyi Wang (王培懿)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
Liming Xiao (力铭 肖) | Weidong Zhan (卫东 詹) | Zhifang Sui (穗志方) | Yuhang Qin (秦宇航) | Chunhui Sun (春晖 孙) | Dan Xing (丹 邢) | Nan Li (楠 李) | Fangwei Zhu (方韦 祝) | Peiyi Wang (王培懿)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“第三届中文空间语义理解评测任务(SpaCE2023)旨在测试机器的空间语义理解能力,包括三个子任务:(1)空间信息异常识别任务;(2)空间语义角色标注任务;(3)空间场景异同判断任务。本届评测在SpaCE2022的基础上,优化了子任务一和子任务二的任务设计,并提出了子任务三这一全新的评测任务。最终有1支队伍提交参赛结果,并且在子任务一上的成绩超过了基线模型。本文还报告了大语言模型ChatGPT在SpaCE2023三个子任务上的表现,结合问题提出指令设计可改进的方向。”
2022
Align-smatch: A Novel Evaluation Method for Chinese Abstract Meaning Representation Parsing based on Alignment of Concept and Relation
Liming Xiao | Bin Li | Zhixing Xu | Kairui Huo | Minxuan Feng | Junsheng Zhou | Weiguang Qu
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Liming Xiao | Bin Li | Zhixing Xu | Kairui Huo | Minxuan Feng | Junsheng Zhou | Weiguang Qu
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Abstract Meaning Representation is a sentence-level meaning representation, which abstracts the meaning of sentences into a rooted acyclic directed graph. With the continuous expansion of Chinese AMR corpus, more and more scholars have developed parsing systems to automatically parse sentences into Chinese AMR. However, the current parsers can’t deal with concept alignment and relation alignment, let alone the evaluation methods for AMR parsing. Therefore, to make up for the vacancy of Chinese AMR parsing evaluation methods, based on AMR evaluation metric smatch, we have improved the algorithm of generating triples so that to make it compatible with concept alignment and relation alignment. Finally, we obtain a new integrity metric align-smatch for paring evaluation. A comparative research then was conducted on 20 manually annotated AMR and gold AMR, with the result that align-smatch works well in alignments and more robust in evaluating arcs. We also put forward some fine-grained metric for evaluating concept alignment, relation alignment and implicit concepts, in order to further measure parsers’ performance in subtasks.
2020
Integration of Automatic Sentence Segmentation and Lexical Analysis of Ancient Chinese based on BiLSTM-CRF Model
Ning Cheng | Bin Li | Liming Xiao | Changwei Xu | Sijia Ge | Xingyue Hao | Minxuan Feng
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages
Ning Cheng | Bin Li | Liming Xiao | Changwei Xu | Sijia Ge | Xingyue Hao | Minxuan Feng
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages
The basic tasks of ancient Chinese information processing include automatic sentence segmentation, word segmentation, part-of-speech tagging and named entity recognition. Tasks such as lexical analysis need to be based on sentence segmentation because of the reason that a plenty of ancient books are not punctuated. However, step-by-step processing is prone to cause multi-level diffusion of errors. This paper designs and implements an integrated annotation system of sentence segmentation and lexical analysis. The BiLSTM-CRF neural network model is used to verify the generalization ability and the effect of sentence segmentation and lexical analysis on different label levels on four cross-age test sets. Research shows that the integration method adopted in ancient Chinese improves the F1-score of sentence segmentation, word segmentation and part of speech tagging. Based on the experimental results of each test set, the F1-score of sentence segmentation reached 78.95, with an average increase of 3.5%; the F1-score of word segmentation reached 85.73%, with an average increase of 0.18%; and the F1-score of part-of-speech tagging reached 72.65, with an average increase of 0.35%.