基于对比学习和排名一致性的古代汉语翻译质量评估模型(Ancient Chinese translation quality evaluation model based on contrastive learning and ranking consistency)
Li Huaiming (李怀明)
Shao Yanqiu (邵艳秋)
Li Wei (李炜)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
基于双层语义映射的大语言模型辅助古汉语事件抽取半自动标注框架(A Semi-automatic Annotation Framework for Event Extraction in Classical Chinese Assisted by Large Language Models Based)
Wei Congcong (卫聪聪)
Li Wei (李炜)
Feng Zhenbing (冯振冰)
Shao Yanqiu (邵艳秋)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
A Two-stage Generative Chinese AMR Parsing Method Based on Large Language Models
Shen Zizhuo
Shao Yanqiu
Li Wei
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“The purpose of the CAMR task is to convert natural language into a formalized semantic representation in the form of a graph structure. Due to the complexity of the AMR graph structure, traditional AMR automatic parsing methods often require the design of complex models and strategies. Thanks to the powerful generative capabilities of LLMs, adopting an autore-gressive generative approach for AMR parsing has many advantages such as simple modeling and strong extensibility. To further explore the generative AMR automatic parsing technology based on LLMs, we design a two-stage AMR automatic parsing method based on LLMs in this CAMR evaluation. Specifically, we design two pipeline subtasks of alignment-aware node generation and relationship-aware node generation to reduce the difficulty of LLM understanding and generation. Additionally, to boost the system’s transferability, we incorporate a retrieval-augmented strategy during both training and inference phases. The experimental results show that the method we proposed has achieved promising results in this evaluation.”
Classical Chinese Historical Event Detection Evaluation
Feng Zhenbing
Li Wei
Shao Yanqiu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“Event detection involves identifying and extracting event information from natural language texts. The complex syntax and semantics of Classical Chinese, coupled with its limited usage, pose significant challenges for information extraction tasks on classical Chinese texts. At the 23rd China National Conference on Computational Linguistics (CCL 2024), we launched an evaluation task focused on the extraction of historical events from Classical Chinese. We used our constructed Classical Chinese Historical Event Logical Schema to identify event triggers and classify event types. The evaluation utilized the Classical Chinese Historical Event Detection Dataset (CHED), annotated from The Twenty-Four Histories corpus, with the aim of enhancing event extraction technologies and advancing the digital study of classical Chinese historical texts. The evaluation included two subtasks and attracted 28 teams, with 15 teams submitting valid results. In the subtask of trigger identification, the best-performing system achieved an Exact match score of 63.6%. In the subtasks of coarse-grained and fine-grained event type classification, the top systems achieved F1-scores of 84.5% and 81.4%, respectively.”
CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection
Wei Congcong
Feng Zhenbing
Huang Shutan
Li Wei
Shao Yanqiu
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
“Event detection (ED) is a crucial area of natural language processing that automates the extrac-tion of specific event types from large-scale text, and studying historical ED in classical Chinesetexts helps preserve and inherit historical and cultural heritage by extracting valuable informa-tion. However, classical Chinese language characteristics, such as ambiguous word classes andcomplex semantics, have posed challenges and led to a lack of datasets and limited research onevent schema construction. In addition, large-scale datasets in English and modern Chinese arenot directly applicable to historical ED in classical Chinese. To address these issues, we con-structed a logical event schema for classical Chinese historical texts and annotated the resultingdataset, which is called classical Chinese Historical Event Dataset (CHED). The main challengesin our work on classical Chinese historical ED are accurately identifying and classifying eventswithin cultural and linguistic contexts and addressing ambiguity resulting from multiple mean-ings of words in historical texts. Therefore, we have developed a set of annotation guidelinesand provided annotators with an objective reference translation. The average Kappa coefficientafter multiple cross-validation is 68.49%, indicating high quality and consistency. We conductedvarious tasks and comparative experiments on established baseline models for historical ED inclassical Chinese. The results showed that BERT+CRF had the best performance on sequencelabeling task, with an f1-score of 76.10%, indicating potential for further improvement. 1Introduction”
Exploring Accurate and Generic Simile Knowledge from Pre-trained Language Models
Zhou Shuhan
Ma Longxuan
Shao Yanqiu
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
“A simile is an important linguistic phenomenon in daily communication and an important taskin natural language processing (NLP). In recent years, pre-trained language models (PLMs) haveachieved great success in NLP since they learn generic knowledge from a large corpus. However,PLMs still have hallucination problems that they could generate unrealistic or context-unrelatedinformation.In this paper, we aim to explore more accurate simile knowledge from PLMs.To this end, we first fine-tune a single model to perform three main simile tasks (recognition,interpretation, and generation). In this way, the model gains a better understanding of the simileknowledge. However, this understanding may be limited by the distribution of the training data. To explore more generic simile knowledge from PLMs, we further add semantic dependencyfeatures in three tasks. The semantic dependency feature serves as a global signal and helpsthe model learn simile knowledge that can be applied to unseen domains. We test with seenand unseen domains after training. Automatic evaluations demonstrate that our method helps thePLMs to explore more accurate and generic simile knowledge for downstream tasks. Our methodof exploring more accurate knowledge is not only useful for simile study but also useful for otherNLP tasks leveraging knowledge from PLMs. Our code and data will be released on GitHub.”