The First International Ancient Chinese Word Segmentation and POS Tagging Bakeoff: Overview of the EvaHan 2022 Evaluation Campaign

Bin Li, Yiguo Yuan, Jingya Lu, Minxuan Feng, Chao Xu, Weiguang Qu, Dongbo Wang


Abstract
This paper presents the results of the First Ancient Chinese Word Segmentation and POS Tagging Bakeoff (EvaHan), which was held at the Second Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) 2022, in the context of the 13th Edition of the Language Resources and Evaluation Conference (LREC 2022). We give the motivation for having an international shared contest, as well as the data and tracks. The contest is consisted of two modalities, closed and open. In the closed modality, the participants are only allowed to use the training data, obtained the highest F1 score of 96.03% and 92.05% in word segmentation and POS tagging. In the open modality, the participants can use whatever resource they have, with the highest F1 score of 96.34% and 92.56% in word segmentation and POS tagging. The scores on the blind test dataset decrease around 3 points, which shows that the out-of-vocabulary words still are the bottleneck for lexical analyzers.
Anthology ID:
2022.lt4hala-1.19
Volume:
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Rachele Sprugnoli, Marco Passarotti
Venue:
LT4HALA
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
135–140
Language:
URL:
https://aclanthology.org/2022.lt4hala-1.19
DOI:
Bibkey:
Cite (ACL):
Bin Li, Yiguo Yuan, Jingya Lu, Minxuan Feng, Chao Xu, Weiguang Qu, and Dongbo Wang. 2022. The First International Ancient Chinese Word Segmentation and POS Tagging Bakeoff: Overview of the EvaHan 2022 Evaluation Campaign. In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 135–140, Marseille, France. European Language Resources Association.
Cite (Informal):
The First International Ancient Chinese Word Segmentation and POS Tagging Bakeoff: Overview of the EvaHan 2022 Evaluation Campaign (Li et al., LT4HALA 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lt4hala-1.19.pdf