Erhong Yang

Also published as: Erhong YANG


pdf bib
Leveraging Prefix Transfer for Multi-Intent Text Revision
Ruining Chong | Cunliang Kong | Liu Wu | Zhenghao Liu | Ziye Jin | Liner Yang | Yange Fan | Hanghang Fan | Erhong Yang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Text revision is a necessary process to improve text quality. During this process, writers constantly edit texts out of different edit intentions. Identifying edit intention for a raw text is always an ambiguous work, and most previous work on revision systems mainly focuses on editing texts according to one specific edit intention. In this work, we aim to build a multi-intent text revision system that could revise texts without explicit intent annotation. Our system is based on prefix-tuning, which first gets prefixes for every edit intent, and then trains a prefix transfer module, enabling the system to selectively leverage the knowledge from various prefixes according to the input text. We conduct experiments on the IteraTeR dataset, and the results show that our system outperforms baselines. The system can significantly improve the SARI score with more than 3% improvements, which thrives on the learned editing intention prefixes.


pdf bib
汉语增强依存句法自动转换研究(Transformation of Enhanced Dependencies in Chinese)
Jingsi Yu (余婧思) | Shi Jialu (师佳璐) | Liner Yang (杨麟儿) | Dan Xiao (肖丹) | Erhong Yang (杨尔弘)
Proceedings of the 21st Chinese National Conference on Computational Linguistics


pdf bib
句式结构树库的自动构建研究(Automatic Construction of Sentence Pattern Structure Treebank)
Chenhui Xie (谢晨晖) | Zhengsheng Hu (胡正升) | Liner Yang (杨麟儿) | Tianxin Liao (廖田昕) | Erhong Yang (杨尔弘)
Proceedings of the 21st Chinese National Conference on Computational Linguistics


pdf bib
COMPILING: A Benchmark Dataset for Chinese Complexity Controllable Definition Generation
Jiaxin Yuan | Cunliang Kong | Chenhui Xie | Liner Yang | Erhong Yang
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“The definition generation task aims to generate a word’s definition within a specific context automatically. However, owing to the lack of datasets for different complexities, the definitions produced by models tend to keep the same complexity level. This paper proposes a novel task of generating definitions for a word with controllable complexity levels. Correspondingly, we introduce COMPILING, a dataset given detailed information about Chinese definitions, and each definition is labeled with its complexity levels. The COMPILING dataset includes 74,303 words and 106,882 definitions. To the best of our knowledge, it is the largest dataset of the Chinese definition generation task. We select various representative generation methods as baselines for this task and conduct evaluations, which illustrates that our dataset plays an outstanding role in assisting models in generating different complexity-level definitions. We believe that the COMPILING dataset will benefit further research in complexity controllable definition generation.”

pdf bib
CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform
Yue Cui | Junhui Zhu | Liner Yang | Xuezhi Fang | Xiaobin Chen | Yujie Wang | Erhong Yang
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The construct of linguistic complexity has been widely used in language learning research. Several text analysis tools have been created to automatically analyze linguistic complexity. However, the indexes supported by several existing Chinese text analysis tools are limited and different because of different research purposes. CTAP is an open-source linguistic complexity measurement extraction tool, which prompts any research purposes. Although it was originally developed for English, the Unstructured Information Management (UIMA) framework it used allows the integration of other languages. In this study, we integrated the Chinese component into CTAP, describing the index sets it incorporated and comparing it with three linguistic complexity tools for Chinese. The index set includes four levels of 196 linguistic complexity indexes: character level, word level, sentence level, and discourse level. So far, CTAP has implemented automatic calculation of complexity characteristics for four languages, aiming to help linguists without NLP background study language complexity.

pdf bib
Multitasking Framework for Unsupervised Simple Definition Generation
Cunliang Kong | Yun Chen | Hengyuan Zhang | Liner Yang | Erhong Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The definition generation task can help language learners by providing explanations for unfamiliar words. This task has attracted much attention in recent years. We propose a novel task of Simple Definition Generation (SDG) to help language learners and low literacy readers. A significant challenge of this task is the lack of learner’s dictionaries in many languages, and therefore the lack of data for supervised training. We explore this task and propose a multitasking framework SimpDefiner that only requires a standard dictionary with complex definitions and a corpus containing arbitrary simple texts. We disentangle the complexity factors from the text by carefully designing a parameter sharing scheme between two decoders. By jointly training these components, the framework can generate both complex and simple definitions simultaneously. We demonstrate that the framework can generate relevant, simple definitions for the target words through automatic and manual evaluations on English and Chinese datasets. Our method outperforms the baseline model by a 1.77 SARI score on the English dataset, and raises the proportion of the low level (HSK level 1-3) words in Chinese definitions by 3.87%.

pdf bib
BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling
Cunliang Kong | Yujie Wang | Ruining Chong | Liner Yang | Hengyuan Zhang | Erhong Yang | Yaping Huang
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes the BLCU-ICALL system used in the SemEval-2022 Task 1 Comparing Dictionaries and Word Embeddings, the Definition Modeling subtrack, achieving 1st on Italian, 2nd on Spanish and Russian, and 3rd on English and French. We propose a transformer-based multitasking framework to explore the task. The framework integrates multiple embedding architectures through the cross-attention mechanism, and captures the structure of glosses through a masking language model objective. Additionally, we also investigate a simple but effective model ensembling strategy to further improve the robustness. The evaluation results show the effectiveness of our solution. We release our code at:


pdf bib
中美学者学术英语写作中词汇难度特征比较研究——以计算语言学领域论文为例(A Comparative Study of the Features of Lexical Sophistication in Academic English Writing by Chinese and American)
Yonghui Xie (谢永慧) | Yang Liu (刘洋) | Erhong Yang (杨尔弘) | Liner Yang (杨麟儿)
Proceedings of the 20th Chinese National Conference on Computational Linguistics



pdf bib
面向汉语作为第二语言学习的个性化语法纠错(Personalizing Grammatical Error Correction for Chinese as a Second Language)
Shengsheng Zhang (张生盛) | Guina Pang (庞桂娜) | Liner Yang (杨麟儿) | Chencheng Wang (王辰成) | Yongping Du (杜永萍) | Erhong Yang (杨尔弘) | Yaping Huang (黄雅平)
Proceedings of the 19th Chinese National Conference on Computational Linguistics


pdf bib
基于BERT与柱搜索的中文释义生成(Chinese Definition Modeling Based on BERT and Beam Seach)
Qinan Fan (范齐楠) | Cunliang Kong (孔存良) | Liner Yang (杨麟儿) | Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics


pdf bib
汉语学习者依存句法树库构建(Construction of a Treebank of Learner Chinese)
Jialu Shi (师佳璐) | Xinyu Luo (罗昕宇) | Liner Yang (杨麟儿) | Dan Xiao (肖丹) | Zhengsheng Hu (胡正声) | Yijun Wang (王一君) | Jiaxin Yuan (袁佳欣) | Yu Jingsi (余婧思) | Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics


pdf bib
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications
Erhong YANG | Endong XUN | Baolin ZHANG | Gaoqi RAO
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications

pdf bib
Overview of NLPTEA-2020 Shared Task for Chinese Grammatical Error Diagnosis
Gaoqi Rao | Erhong Yang | Baolin Zhang
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications

This paper presents the NLPTEA 2020 shared task for Chinese Grammatical Error Diagnosis (CGED) which seeks to identify grammatical error types, their range of occurrence and recommended corrections within sentences written by learners of Chinese as a foreign language. We describe the task definition, data preparation, performance metrics, and evaluation results. Of the 30 teams registered for this shared task, 17 teams developed the system and submitted a total of 43 runs. System performances achieved a significant progress, reaching F1 of 91% in detection level, 40% in position level and 28% in correction level. All data sets with gold standards and scoring scripts are made publicly available to researchers.


pdf bib
The Annotation of Event Schema in Chinese
Hongjian Zou | Erhong Yang | Yan Gao | Qingqing Zeng
Proceedings of the Eighth Workshop on Asian Language Resouces


pdf bib
The Research of Word Sense Disambiguation Method Based on Co-occurrence Frequency of Hownet
Erhong Yang | Guoqing Zhang | Yongkui Zhang
Second Chinese Language Processing Workshop