藏文文本校对评测集构建(Construction of Tibetan Text Proofreading Evaluation Set)

Maocuo San (三毛措), Zhijie Cai (才智杰), Jizaxi Dao (道吉扎西)


Abstract
文本校对评测集是拼写检查研究的基础,包括传统文本校对评测集和标准文本校对评测集。传统文本校对评测集是对正确的数据集通过主观经验人工伪造而得到的评测集,是一种常用的文本校对评测方式,但也存在诸多的缺陷。标准文本校对评测集是通过选择研究对象获取可信度强的真实数据集而得到的评测集。本文在分析英、汉文文本校对评测集构建方法的基础上,结合藏文的特点研究了藏文文本校对评测集的构建方法,构建了用于评价藏文文本校对性能的标准文本校对评测集,并统计分析了评测集中的错误类型及分布,以此验证本文构建的标准文本校对评测集的有效性和可用性。
Anthology ID:
2021.ccl-1.22
Volume:
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Month:
August
Year:
2021
Address:
Huhhot, China
Editors:
Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
229–237
Language:
Chinese
URL:
https://aclanthology.org/2021.ccl-1.22
DOI:
Bibkey:
Cite (ACL):
Maocuo San, Zhijie Cai, and Jizaxi Dao. 2021. 藏文文本校对评测集构建(Construction of Tibetan Text Proofreading Evaluation Set). In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 229–237, Huhhot, China. Chinese Information Processing Society of China.
Cite (Informal):
藏文文本校对评测集构建(Construction of Tibetan Text Proofreading Evaluation Set) (San et al., CCL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ccl-1.22.pdf