2024
pdf
bib
abs
Automatic Construction of the English Sentence Pattern Structure Treebank for Chinese ESL learners
Lin Zhu
|
Meng Xu
|
Wenya Guo
|
Jingsi Yu
|
Liner Yang
|
Zehuang Cao
|
Yuan Huang
|
Erhong Yang
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“Analyzing long and complicated sentences has always been a priority and challenge in Englishlearning. In order to conduct the parse of these sentences for Chinese English as Second Lan-guage (ESL) learners, we design the English Sentence Pattern Structure (ESPS) based on theSentence Diagramming theory. Then, we automatically construct the English Sentence PatternStructure Treebank (ESPST) through the method of rule conversion based on constituency struc-ture and evaluate the conversion results. In addition, we set up two comparative experiments,using trained parser and large language models (LLMs). The results prove that the rule-basedconversion approach is effective.”
pdf
bib
abs
Cost-efficient Crowdsourcing for Span-based Sequence Labeling:Worker Selection and Data Augmentation
Yujie Wang
|
Chao Huang
|
Liner Yang
|
Zhixuan Fang
|
Yaping Huang
|
Yang Liu
|
Jingsi Yu
|
Erhong Yang
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“This paper introduces a novel crowdsourcing worker selection algorithm, enhancing annotationquality and reducing costs. Unlike previous studies targeting simpler tasks, this study con-tends with the complexities of label interdependencies in sequence labeling. The proposedalgorithm utilizes a Combinatorial Multi-Armed Bandit (CMAB) approach for worker selec-tion, and a cost-effective human feedback mechanism. The challenge of dealing with imbal-anced and small-scale datasets, which hinders offline simulation of worker selection, is tack-led using an innovative data augmentation method termed shifting, expanding, and shrink-ing (SES). Rigorous testing on CoNLL 2003 NER and Chinese OEI datasets showcased thealgorithm’s efficiency, with an increase in F1 score up to 100.04% of the expert-only base-line, alongside cost savings up to 65.97%. The paper also encompasses a dataset-independenttest emulating annotation evaluation through a Bernoulli distribution, which still led to animpressive 97.56% F1 score of the expert baseline and 59.88% cost savings. Furthermore,our approach can be seamlessly integrated into Reinforcement Learning from Human Feed-back (RLHF) systems, offering a cost-effective solution for obtaining human feedback. All re-sources, including source code and datasets, are available to the broader research community athttps://github.com/blcuicall/nlp-crowdsourcing.”
2022
pdf
bib
abs
汉语增强依存句法自动转换研究(Transformation of Enhanced Dependencies in Chinese)
Jingsi Yu (余婧思)
|
Shi Jialu (师佳璐)
|
Liner Yang (杨麟儿)
|
Dan Xiao (肖丹)
|
Erhong Yang (杨尔弘)
Proceedings of the 21st Chinese National Conference on Computational Linguistics
“自动句法分析是自然语言处理中的一项核心任务,受限于依存句法中每个节点只能有一条入弧的规则,基础依存句法中许多实词之间的关系无法用依存弧和依存标签直接标明;同时,已有的依存句法体系中的依存关系还有进一步细化、提升的空间,以便从中提取连贯的语义关系。面对这种情况,本文在斯坦福基础依存句法规范的基础上,研制了汉语增强依存句法规范,主要贡献在于:介词和连词的增强、并列项的传播、句式转换和特殊句式的增强。此外,本文提供了基于Python的汉语增强依存句法转换的转换器,以及一个基于Web的演示,该演示将句子从基础依存句法树通过本文的规范解析成依存图。最后,本文探索了增强依存句法的实际应用,并以搭配抽取和信息抽取为例进行相关讨论。”