2024
pdf
bib
abs
基于双层语义映射的大语言模型辅助古汉语事件抽取半自动标注框架(A Semi-automatic Annotation Framework for Event Extraction in Classical Chinese Assisted by Large Language Models Based)
Wei Congcong (卫聪聪)
|
Li Wei (李炜)
|
Feng Zhenbing (冯振冰)
|
Shao Yanqiu (邵艳秋)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“尽管自然语言处理技术(歎歌歐)在现代语言事件抽取任务(歅歅)上已有较为成熟的解决方案,但针对古汉语事件抽取的研究却受限于标注数据匮乏和文本语义复杂等挑战。因而我们提出使用当前取得巨大成功的大语言模型(歌歌歍歳)来辅助人类标注员进行数据标注。为了应对歌歌歍歳在古汉语上存在的训练不足、语义理解能力欠缺的问题,我们提出了一种基于双层语义映射的歌歌歍歳辅助古汉语事件抽取半自动标注框架,利用古汉语的现代汉语译文,结合事件语义学理论及语义依存分析技术,为歌歌歍歳提供丰富的语义信息表示,从而进一步将语义依存关系逐步映射为具体的事件信息。经过人类标注员的审核反馈,有效克服了现有歎歌歐工具和歌歌歍歳在古汉语事件抽取标注时的局限。实验结果表明,我们的方法不仅提高了古汉语事件抽取标注的准确性和效率,而且减少了对专业人员的依赖和人工标注工作量,为低资源语言标注实践提供了新的方法论,探索了大模型时代数据标注的新方向。”
pdf
bib
abs
Classical Chinese Historical Event Detection Evaluation
Feng Zhenbing
|
Li Wei
|
Shao Yanqiu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“Event detection involves identifying and extracting event information from natural language texts. The complex syntax and semantics of Classical Chinese, coupled with its limited usage, pose significant challenges for information extraction tasks on classical Chinese texts. At the 23rd China National Conference on Computational Linguistics (CCL 2024), we launched an evaluation task focused on the extraction of historical events from Classical Chinese. We used our constructed Classical Chinese Historical Event Logical Schema to identify event triggers and classify event types. The evaluation utilized the Classical Chinese Historical Event Detection Dataset (CHED), annotated from The Twenty-Four Histories corpus, with the aim of enhancing event extraction technologies and advancing the digital study of classical Chinese historical texts. The evaluation included two subtasks and attracted 28 teams, with 15 teams submitting valid results. In the subtask of trigger identification, the best-performing system achieved an Exact match score of 63.6%. In the subtasks of coarse-grained and fine-grained event type classification, the top systems achieved F1-scores of 84.5% and 81.4%, respectively.”
2023
pdf
bib
abs
CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection
Wei Congcong
|
Feng Zhenbing
|
Huang Shutan
|
Li Wei
|
Shao Yanqiu
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
“Event detection (ED) is a crucial area of natural language processing that automates the extrac-tion of specific event types from large-scale text, and studying historical ED in classical Chinesetexts helps preserve and inherit historical and cultural heritage by extracting valuable informa-tion. However, classical Chinese language characteristics, such as ambiguous word classes andcomplex semantics, have posed challenges and led to a lack of datasets and limited research onevent schema construction. In addition, large-scale datasets in English and modern Chinese arenot directly applicable to historical ED in classical Chinese. To address these issues, we con-structed a logical event schema for classical Chinese historical texts and annotated the resultingdataset, which is called classical Chinese Historical Event Dataset (CHED). The main challengesin our work on classical Chinese historical ED are accurately identifying and classifying eventswithin cultural and linguistic contexts and addressing ambiguity resulting from multiple mean-ings of words in historical texts. Therefore, we have developed a set of annotation guidelinesand provided annotators with an objective reference translation. The average Kappa coefficientafter multiple cross-validation is 68.49%, indicating high quality and consistency. We conductedvarious tasks and comparative experiments on established baseline models for historical ED inclassical Chinese. The results showed that BERT+CRF had the best performance on sequencelabeling task, with an f1-score of 76.10%, indicating potential for further improvement. 1Introduction”