2024
pdf
bib
abs
Automatic Construction of the English Sentence Pattern Structure Treebank for Chinese ESL learners
Zhu Lin
|
Xu Meng
|
Guo Wenya
|
Yu Jingsi
|
Yang Liner
|
Cao Zehuang
|
Huang Yuan
|
Yang Erhong
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“Analyzing long and complicated sentences has always been a priority and challenge in Englishlearning. In order to conduct the parse of these sentences for Chinese English as Second Lan-guage (ESL) learners, we design the English Sentence Pattern Structure (ESPS) based on theSentence Diagramming theory. Then, we automatically construct the English Sentence PatternStructure Treebank (ESPST) through the method of rule conversion based on constituency struc-ture and evaluate the conversion results. In addition, we set up two comparative experiments,using trained parser and large language models (LLMs). The results prove that the rule-basedconversion approach is effective.”
pdf
bib
abs
Cost-efficient Crowdsourcing for Span-based Sequence Labeling:Worker Selection and Data Augmentation
Wang Yujie
|
Huang Chao
|
Yang Liner
|
Fang Zhixuan
|
Huang Yaping
|
Liu Yang
|
Yu Jingsi
|
Yang Erhong
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“This paper introduces a novel crowdsourcing worker selection algorithm, enhancing annotationquality and reducing costs. Unlike previous studies targeting simpler tasks, this study con-tends with the complexities of label interdependencies in sequence labeling. The proposedalgorithm utilizes a Combinatorial Multi-Armed Bandit (CMAB) approach for worker selec-tion, and a cost-effective human feedback mechanism. The challenge of dealing with imbal-anced and small-scale datasets, which hinders offline simulation of worker selection, is tack-led using an innovative data augmentation method termed shifting, expanding, and shrink-ing (SES). Rigorous testing on CoNLL 2003 NER and Chinese OEI datasets showcased thealgorithm’s efficiency, with an increase in F1 score up to 100.04% of the expert-only base-line, alongside cost savings up to 65.97%. The paper also encompasses a dataset-independenttest emulating annotation evaluation through a Bernoulli distribution, which still led to animpressive 97.56% F1 score of the expert baseline and 59.88% cost savings. Furthermore,our approach can be seamlessly integrated into Reinforcement Learning from Human Feed-back (RLHF) systems, offering a cost-effective solution for obtaining human feedback. All re-sources, including source code and datasets, are available to the broader research community athttps://github.com/blcuicall/nlp-crowdsourcing.”
2020
pdf
bib
abs
汉语学习者依存句法树库构建(Construction of a Treebank of Learner Chinese)
Jialu Shi (师佳璐)
|
Xinyu Luo (罗昕宇)
|
Liner Yang (杨麟儿)
|
Dan Xiao (肖丹)
|
Zhengsheng Hu (胡正声)
|
Yijun Wang (王一君)
|
Jiaxin Yuan (袁佳欣)
|
Yu Jingsi (余婧思)
|
Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
汉语学习者依存句法树库为非母语者语料提供依存句法分析,可以支持第二语言教学与研究,也对面向第二语言的句法分析、语法改错等相关研究具有重要意义。然而,现有的汉语学习者依存句法树库数量较少,且在标注方面仍存在一些问题。为此,本文改进依存句法标注规范,搭建在线标注平台,并开展汉语学习者依存句法标注。本文重点介绍了数据选取、标注流程等问题,并对标注结果进行质量分析,探索二语偏误对标注质量与句法分析的影响。