Yang Erhong

Also published as: 尔弘


2024

pdf bib
面向语言学习者的跨语言反馈评语生成方法(Cross-Lingual Feedback Comment Generation for Language Learners)
An Jiyuan (安纪元) | Zhu Lin (朱琳) | Yang Erhong (尔弘 杨)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“反馈评语生成任务旨在为语言学习者的产出提供纠偏及解释性的评价,促进学习者写作能力的发展。现有研究主要聚焦于单语的反馈评语生成,如为英语学习者提供英文反馈评语,但这忽略了非母语学习者可能面临的理解障碍问题,尤其当评语中存在陌生的语言知识时。因此,本文提出跨语言反馈评语生成任务(CLFCG),目的是为语言学习者生成母语的反馈评语。本研究构建了首个英甭中跨语言反馈评语生成数据集,该数据集包含英语学习者产出的语句与相应的中文反馈评语,并探索了基于流水线的预训练语言模型引导增强生成方法,将修正编辑、线索词语和语法术语等作为输入的附加信息,引导和提示生成模型。实验结果表明,附加引导信息的预训练语言模型流水线方法在自动评估(BLEU:50.32)与人工评估(Precision:62.84)上表现良好。本文对实验结果进行了深入分析,以期为跨语言反馈评语生成任务提供更多见解。”

pdf bib
Automatic Construction of the English Sentence Pattern Structure Treebank for Chinese ESL learners
Zhu Lin | Xu Meng | Guo Wenya | Yu Jingsi | Yang Liner | Cao Zehuang | Huang Yuan | Yang Erhong
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“Analyzing long and complicated sentences has always been a priority and challenge in Englishlearning. In order to conduct the parse of these sentences for Chinese English as Second Lan-guage (ESL) learners, we design the English Sentence Pattern Structure (ESPS) based on theSentence Diagramming theory. Then, we automatically construct the English Sentence PatternStructure Treebank (ESPST) through the method of rule conversion based on constituency struc-ture and evaluate the conversion results. In addition, we set up two comparative experiments,using trained parser and large language models (LLMs). The results prove that the rule-basedconversion approach is effective.”

pdf bib
Cost-efficient Crowdsourcing for Span-based Sequence Labeling:Worker Selection and Data Augmentation
Wang Yujie | Huang Chao | Yang Liner | Fang Zhixuan | Huang Yaping | Liu Yang | Yu Jingsi | Yang Erhong
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“This paper introduces a novel crowdsourcing worker selection algorithm, enhancing annotationquality and reducing costs. Unlike previous studies targeting simpler tasks, this study con-tends with the complexities of label interdependencies in sequence labeling. The proposedalgorithm utilizes a Combinatorial Multi-Armed Bandit (CMAB) approach for worker selec-tion, and a cost-effective human feedback mechanism. The challenge of dealing with imbal-anced and small-scale datasets, which hinders offline simulation of worker selection, is tack-led using an innovative data augmentation method termed shifting, expanding, and shrink-ing (SES). Rigorous testing on CoNLL 2003 NER and Chinese OEI datasets showcased thealgorithm’s efficiency, with an increase in F1 score up to 100.04% of the expert-only base-line, alongside cost savings up to 65.97%. The paper also encompasses a dataset-independenttest emulating annotation evaluation through a Bernoulli distribution, which still led to animpressive 97.56% F1 score of the expert baseline and 59.88% cost savings. Furthermore,our approach can be seamlessly integrated into Reinforcement Learning from Human Feed-back (RLHF) systems, offering a cost-effective solution for obtaining human feedback. All re-sources, including source code and datasets, are available to the broader research community athttps://github.com/blcuicall/nlp-crowdsourcing.”

2023

pdf bib
人工智能生成语言与人类语言对比研究——以ChatGPT为例(A Comparative Study of Language between Artificial Intelligence and Human: A Case Study of ChatGPT)
Zhu Junhui (君辉 朱) | Wang Mengyan (梦焰 王) | Yang Erhong (尔弘 杨) | Nie Jingran (锦燃 聂) | Wang Yujie (誉杰 王) | Yue Yan (岩 岳) | Yang Liner (麟儿 杨)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“基于自然语言生成技术的聊天机器人ChatGPT能够快速生成回答,但目前尚未对机器作答所使用的语言与人类真实语言在哪些方面存在差异进行充分研究。本研究提取并计算159个语言特征在人类和ChatGPT对中文开放域问题作答文本中的分布,使用随机森林、逻辑回归和支持向量机(SVM)三种机器学习算法训练人工智能探测器,并评估模型性能。实验结果表明,随机森林和SVM均能达到较高的分类准确率。通过对比分析,研究揭示了两种文本在描述性特征、字词常用度、字词多样性、句法复杂性、语篇凝聚力五个维度上语言表现的优势和不足。结果显示,两种文本之间的差异主要集中在描述性特征、字词常用度、字词多样性三个维度。”

pdf bib
Lexical Complexity Controlled Sentence Generation for Language Learning
Nie Jinran | Yang Liner | Chen Yun | Kong Cunliang | Zhu Junhui | Yang Erhong
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“Language teachers spend a lot of time developing good examples for language learners. For this reason, we define a new task for language learning, lexical complexity controlledsentence generation, which requires precise control over the lexical complexity in thekeywords to examples generation and better fluency and semantic consistency. The chal-lenge of this task is to generate fluent sentences only using words of given complexitylevels. We propose a simple but effective approach for this task based on complexityembedding while controlling sentence length and syntactic complexity at the decodingstage. Compared with potential solutions, our approach fuses the representations of theword complexity levels into the model to get better control of lexical complexity. Andwe demonstrate the feasibility of the approach for both training models from scratch andfine-tuning the pre-trained models. To facilitate the research, we develop two datasetsin English and Chinese respectively, on which extensive experiments are conducted. Ex-perimental results show that our approach provides more precise control over lexicalcomplexity, as well as better fluency and diversity.”

pdf bib
CCL23-Eval 任务7总结报告: 汉语学习者文本纠错(Overview of CCL23-Eval Task: Chinese Learner Text Correction)
Hongxiang Chang | Yang Liu | Meng Xu | Yingying Wang | Cunliang Kong | Liner Yang | Yang Erhong | Maosong Sun | Gaoqi Rao | Renfen Hu | Zhenghao Liu | 鸿翔 常 | 洋 刘 | 萌 徐 | 莹莹 王 | 存良 孔 | 麟儿 杨 | 尔弘 杨 | 茂松 孙 | 高琦 饶 | 韧奋 胡 | 正皓 刘
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“汉语学习者文本纠错(Chinese Learner Text Correction)评测比赛,是依托于第22届中国计算语言学大会举办的技术评测。针对汉语学习者文本,设置了多维度汉语学习者文本纠错和中文语法错误检测两个赛道。结合人工智能技术的不断进步和发展的时代背景,在两赛道下分别设置开放和封闭任务。开放任务允许使用大模型。以汉语学习者文本多维标注语料库YACLC为基础建设评测数据集,建立基于多参考答案的评价标准,构建基准评测框架,进一步推动汉语学习者文本纠错研究的发展。共38支队伍报名参赛,其中5支队伍成绩优异并提交了技术报告。”