Yixuan Zhang


2023

pdf bib
Overview of CCL23-Eval Task 2: The Third Chinese Abstract Meaning Representation Parsing Evaluation
Zhixing Xu | Yixuan Zhang | Bin Li | Zhou Junsheng | Weiguang Qu
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“Abstract Meaning Representation has emerged as a prominent area of research in sentence-levelsemantic parsing within the field of natural language processing in recent years. Substantialprogress has been made in various NLP subtasks through the application of AMR. This paperpresents the third Chinese Abstract Meaning Representation Parsing Evaluation, held as part ofthe Technical Evaluation Task Workshop at the 22nd Chinese Computational Linguistics Confer-ence. The evaluation was specifically tailored for the Chinese and utilized the Align-smatch met-ric as the standard evaluation criterion. Building upon high-quality semantic annotation schemesand annotated corpora, this evaluation introduced a new test set comprising interrogative sen-tences for comprehensive evaluation. The results of the evaluation, as measured by the F-score,indicate notable performance achievements. The top-performing team attained a score of 0.8137in the closed test and 0.8261 in the open test, respectively, using the Align-smatch metric. No-tably, the leading result surpassed the SOTA performance at CoNLL 2020 by 3.64 percentagepoints when evaluated using the MRP metric. Further analysis revealed that this significantprogress primarily stemmed from improved relation prediction between concepts. However, thechallenge of effectively utilizing semantic relation alignments remains an area that requires fur-ther enhancement.”

pdf bib
Can Large Langauge Model Comprehend Ancient Chinese? A Preliminary Test on ACLUE
Yixuan Zhang | Haonan Li
Proceedings of the Ancient Language Processing Workshop

Large language models (LLMs) have demonstrated exceptional language understanding and generation capabilities. However, their ability to comprehend ancient languages, specifically ancient Chinese, remains largely unexplored. To bridge this gap, we introduce ACLUE, an evaluation benchmark designed to assess the language abilities of models in relation to ancient Chinese. ACLUE consists of 15 tasks that cover a range of skills, including phonetic, lexical, syntactic, semantic, inference and knowledge. By evaluating 8 state-of-the-art multilingual and Chinese LLMs, we have observed a significant divergence in their performance between modern Chinese and ancient Chinese. Among the evaluated models, ChatGLM2 demonstrates the highest level of performance, achieving an average accuracy of 37.45%. We have established a leaderboard for communities to assess their models.