Liner Yang


pdf bib
CCL23-Eval 任务7总结报告: 汉语学习者文本纠错(Overview of CCL23-Eval Task: Chinese Learner Text Correction)
Hongxiang Chang | Yang Liu | Meng Xu | Yingying Wang | Cunliang Kong | Liner Yang | Yang Erhong | Maosong Sun | Gaoqi Rao | Renfen Hu | Zhenghao Liu | 鸿翔 常 | 洋 刘 | 萌 徐 | 莹莹 王 | 存良 孔 | 麟儿 杨 | 尔弘 杨 | 茂松 孙 | 高琦 饶 | 韧奋 胡 | 正皓 刘
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“汉语学习者文本纠错(Chinese Learner Text Correction)评测比赛,是依托于第22届中国计算语言学大会举办的技术评测。针对汉语学习者文本,设置了多维度汉语学习者文本纠错和中文语法错误检测两个赛道。结合人工智能技术的不断进步和发展的时代背景,在两赛道下分别设置开放和封闭任务。开放任务允许使用大模型。以汉语学习者文本多维标注语料库YACLC为基础建设评测数据集,建立基于多参考答案的评价标准,构建基准评测框架,进一步推动汉语学习者文本纠错研究的发展。共38支队伍报名参赛,其中5支队伍成绩优异并提交了技术报告。”

pdf bib
Leveraging Prefix Transfer for Multi-Intent Text Revision
Ruining Chong | Cunliang Kong | Liu Wu | Zhenghao Liu | Ziye Jin | Liner Yang | Yange Fan | Hanghang Fan | Erhong Yang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Text revision is a necessary process to improve text quality. During this process, writers constantly edit texts out of different edit intentions. Identifying edit intention for a raw text is always an ambiguous work, and most previous work on revision systems mainly focuses on editing texts according to one specific edit intention. In this work, we aim to build a multi-intent text revision system that could revise texts without explicit intent annotation. Our system is based on prefix-tuning, which first gets prefixes for every edit intent, and then trains a prefix transfer module, enabling the system to selectively leverage the knowledge from various prefixes according to the input text. We conduct experiments on the IteraTeR dataset, and the results show that our system outperforms baselines. The system can significantly improve the SARI score with more than 3% improvements, which thrives on the learned editing intention prefixes.


pdf bib
CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform
Yue Cui | Junhui Zhu | Liner Yang | Xuezhi Fang | Xiaobin Chen | Yujie Wang | Erhong Yang
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The construct of linguistic complexity has been widely used in language learning research. Several text analysis tools have been created to automatically analyze linguistic complexity. However, the indexes supported by several existing Chinese text analysis tools are limited and different because of different research purposes. CTAP is an open-source linguistic complexity measurement extraction tool, which prompts any research purposes. Although it was originally developed for English, the Unstructured Information Management (UIMA) framework it used allows the integration of other languages. In this study, we integrated the Chinese component into CTAP, describing the index sets it incorporated and comparing it with three linguistic complexity tools for Chinese. The index set includes four levels of 196 linguistic complexity indexes: character level, word level, sentence level, and discourse level. So far, CTAP has implemented automatic calculation of complexity characteristics for four languages, aiming to help linguists without NLP background study language complexity.

pdf bib
BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling
Cunliang Kong | Yujie Wang | Ruining Chong | Liner Yang | Hengyuan Zhang | Erhong Yang | Yaping Huang
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes the BLCU-ICALL system used in the SemEval-2022 Task 1 Comparing Dictionaries and Word Embeddings, the Definition Modeling subtrack, achieving 1st on Italian, 2nd on Spanish and Russian, and 3rd on English and French. We propose a transformer-based multitasking framework to explore the task. The framework integrates multiple embedding architectures through the cross-attention mechanism, and captures the structure of glosses through a masking language model objective. Additionally, we also investigate a simple but effective model ensembling strategy to further improve the robustness. The evaluation results show the effectiveness of our solution. We release our code at:

pdf bib
汉语增强依存句法自动转换研究(Transformation of Enhanced Dependencies in Chinese)
Jingsi Yu (余婧思) | Shi Jialu (师佳璐) | Liner Yang (杨麟儿) | Dan Xiao (肖丹) | Erhong Yang (杨尔弘)
Proceedings of the 21st Chinese National Conference on Computational Linguistics


pdf bib
句式结构树库的自动构建研究(Automatic Construction of Sentence Pattern Structure Treebank)
Chenhui Xie (谢晨晖) | Zhengsheng Hu (胡正升) | Liner Yang (杨麟儿) | Tianxin Liao (廖田昕) | Erhong Yang (杨尔弘)
Proceedings of the 21st Chinese National Conference on Computational Linguistics


pdf bib
COMPILING: A Benchmark Dataset for Chinese Complexity Controllable Definition Generation
Jiaxin Yuan | Cunliang Kong | Chenhui Xie | Liner Yang | Erhong Yang
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“The definition generation task aims to generate a word’s definition within a specific context automatically. However, owing to the lack of datasets for different complexities, the definitions produced by models tend to keep the same complexity level. This paper proposes a novel task of generating definitions for a word with controllable complexity levels. Correspondingly, we introduce COMPILING, a dataset given detailed information about Chinese definitions, and each definition is labeled with its complexity levels. The COMPILING dataset includes 74,303 words and 106,882 definitions. To the best of our knowledge, it is the largest dataset of the Chinese definition generation task. We select various representative generation methods as baselines for this task and conduct evaluations, which illustrates that our dataset plays an outstanding role in assisting models in generating different complexity-level definitions. We believe that the COMPILING dataset will benefit further research in complexity controllable definition generation.”

pdf bib
Multitasking Framework for Unsupervised Simple Definition Generation
Cunliang Kong | Yun Chen | Hengyuan Zhang | Liner Yang | Erhong Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The definition generation task can help language learners by providing explanations for unfamiliar words. This task has attracted much attention in recent years. We propose a novel task of Simple Definition Generation (SDG) to help language learners and low literacy readers. A significant challenge of this task is the lack of learner’s dictionaries in many languages, and therefore the lack of data for supervised training. We explore this task and propose a multitasking framework SimpDefiner that only requires a standard dictionary with complex definitions and a corpus containing arbitrary simple texts. We disentangle the complexity factors from the text by carefully designing a parameter sharing scheme between two decoders. By jointly training these components, the framework can generate both complex and simple definitions simultaneously. We demonstrate that the framework can generate relevant, simple definitions for the target words through automatic and manual evaluations on English and Chinese datasets. Our method outperforms the baseline model by a 1.77 SARI score on the English dataset, and raises the proportion of the low level (HSK level 1-3) words in Chinese definitions by 3.87%.


pdf bib
Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction
Zhenghao Liu | Xiaoyuan Yi | Maosong Sun | Liner Yang | Tat-Seng Chua
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Grammatical Error Correction (GEC) aims to correct writing errors and help language learners improve their writing skills. However, existing GEC models tend to produce spurious corrections or fail to detect lots of errors. The quality estimation model is necessary to ensure learners get accurate GEC results and avoid misleading from poorly corrected sentences. Well-trained GEC models can generate several high-quality hypotheses through decoding, such as beam search, which provide valuable GEC evidence and can be used to evaluate GEC quality. However, existing models neglect the possible GEC evidence from different hypotheses. This paper presents the Neural Verification Network (VERNet) for GEC quality estimation with multiple hypotheses. VERNet establishes interactions among hypotheses with a reasoning graph and conducts two kinds of attention mechanisms to propagate GEC evidence to verify the quality of generated hypotheses. Our experiments on four GEC datasets show that VERNet achieves state-of-the-art grammatical error detection performance, achieves the best quality estimation results, and significantly improves GEC performance by reranking hypotheses. All data and source codes are available at

pdf bib
中美学者学术英语写作中词汇难度特征比较研究——以计算语言学领域论文为例(A Comparative Study of the Features of Lexical Sophistication in Academic English Writing by Chinese and American)
Yonghui Xie (谢永慧) | Yang Liu (刘洋) | Erhong Yang (杨尔弘) | Liner Yang (杨麟儿)
Proceedings of the 20th Chinese National Conference on Computational Linguistics



pdf bib
面向汉语作为第二语言学习的个性化语法纠错(Personalizing Grammatical Error Correction for Chinese as a Second Language)
Shengsheng Zhang (张生盛) | Guina Pang (庞桂娜) | Liner Yang (杨麟儿) | Chencheng Wang (王辰成) | Yongping Du (杜永萍) | Erhong Yang (杨尔弘) | Yaping Huang (黄雅平)
Proceedings of the 19th Chinese National Conference on Computational Linguistics


pdf bib
基于BERT与柱搜索的中文释义生成(Chinese Definition Modeling Based on BERT and Beam Seach)
Qinan Fan (范齐楠) | Cunliang Kong (孔存良) | Liner Yang (杨麟儿) | Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics


pdf bib
汉语学习者依存句法树库构建(Construction of a Treebank of Learner Chinese)
Jialu Shi (师佳璐) | Xinyu Luo (罗昕宇) | Liner Yang (杨麟儿) | Dan Xiao (肖丹) | Zhengsheng Hu (胡正声) | Yijun Wang (王一君) | Jiaxin Yuan (袁佳欣) | Yu Jingsi (余婧思) | Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics



pdf bib
The BLCU System in the BEA 2019 Shared Task
Liner Yang | Chencheng Wang
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper describes the BLCU Group submissions to the Building Educational Applications (BEA) 2019 Shared Task on Grammatical Error Correction (GEC). The task is to detect and correct grammatical errors that occurred in essays. We participate in 2 tracks including the Restricted Track and the Unrestricted Track. Our system is based on a Transformer model architecture. We integrate many effective methods proposed in recent years. Such as, Byte Pair Encoding, model ensemble, checkpoints average and spell checker. We also corrupt the public monolingual data to further improve the performance of the model. On the test data of the BEA 2019 Shared Task, our system yields F0.5 = 58.62 and 59.50, ranking twelfth and fourth respectively.


pdf bib
A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing
Xian-Ling Mao | Yi-Jing Hao | Qiang Zhou | Wen-Qing Yuan | Liner Yang | Heyan Huang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Recently, topic modeling has been widely applied in data mining due to its powerful ability. A common, major challenge in applying such topic models to other tasks is to accurately interpret the meaning of each topic. Topic labeling, as a major interpreting method, has attracted significant attention recently. However, most of previous works only focus on the effectiveness of topic labeling, and less attention has been paid to quickly creating good topic descriptors; meanwhile, it’s hard to assign labels for new emerging topics by using most of existing methods. To solve the problems above, in this paper, we propose a novel fast topic labeling framework that casts the labeling problem as a k-nearest neighbor (KNN) search problem in a probability vector set. Our experimental results show that the proposed sequential interleaving method based on locality sensitive hashing (LSH) technology is efficient in boosting the comparison speed among probability distributions, and the proposed framework can generate meaningful labels to interpret topics, including new emerging topics.