Liner Yang


2022

pdf bib
Multitasking Framework for Unsupervised Simple Definition Generation
Cunliang Kong | Yun Chen | Hengyuan Zhang | Liner Yang | Erhong Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The definition generation task can help language learners by providing explanations for unfamiliar words. This task has attracted much attention in recent years. We propose a novel task of Simple Definition Generation (SDG) to help language learners and low literacy readers. A significant challenge of this task is the lack of learner’s dictionaries in many languages, and therefore the lack of data for supervised training. We explore this task and propose a multitasking framework SimpDefiner that only requires a standard dictionary with complex definitions and a corpus containing arbitrary simple texts. We disentangle the complexity factors from the text by carefully designing a parameter sharing scheme between two decoders. By jointly training these components, the framework can generate both complex and simple definitions simultaneously. We demonstrate that the framework can generate relevant, simple definitions for the target words through automatic and manual evaluations on English and Chinese datasets. Our method outperforms the baseline model by a 1.77 SARI score on the English dataset, and raises the proportion of the low level (HSK level 1-3) words in Chinese definitions by 3.87%.

2021

pdf bib
中美学者学术英语写作中词汇难度特征比较研究——以计算语言学领域论文为例(A Comparative Study of the Features of Lexical Sophistication in Academic English Writing by Chinese and American)
Yonghui Xie (谢永慧) | Yang Liu (刘洋) | Erhong Yang (杨尔弘) | Liner Yang (杨麟儿)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

学术英语写作在国际学术交流中的作用日益凸显,然而对于英语非母语者,学术英语写作是困难的,为此本文对计算语言领域中美学者学术英语写作中词汇难度特征做比较研究。自构建1132篇中美论文全文语料库,统计语料中484个词汇难度特征值。经过特征筛选与因子分析的降维处理得到表现较好的五个维度。最后计算中美学者论文的维度分从而比较差异,发现美国学者的论文相较中国学者的论文中词汇单位更具常用性、二元词串更具稳固性、三元词串更具稳固性、虚词更具复杂性、词类更具关联性。主要原因在于统计特征值时借助的外部资源库与美国学者的论文更贴近,且中国学者没有完全掌握该领域学术写作的习惯。因此,中国学者可充分利用英语本族语者构建的资源库,从而产出更为地道与流利的学术英语论文。

pdf bib
Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction
Zhenghao Liu | Xiaoyuan Yi | Maosong Sun | Liner Yang | Tat-Seng Chua
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Grammatical Error Correction (GEC) aims to correct writing errors and help language learners improve their writing skills. However, existing GEC models tend to produce spurious corrections or fail to detect lots of errors. The quality estimation model is necessary to ensure learners get accurate GEC results and avoid misleading from poorly corrected sentences. Well-trained GEC models can generate several high-quality hypotheses through decoding, such as beam search, which provide valuable GEC evidence and can be used to evaluate GEC quality. However, existing models neglect the possible GEC evidence from different hypotheses. This paper presents the Neural Verification Network (VERNet) for GEC quality estimation with multiple hypotheses. VERNet establishes interactions among hypotheses with a reasoning graph and conducts two kinds of attention mechanisms to propagate GEC evidence to verify the quality of generated hypotheses. Our experiments on four GEC datasets show that VERNet achieves state-of-the-art grammatical error detection performance, achieves the best quality estimation results, and significantly improves GEC performance by reranking hypotheses. All data and source codes are available at https://github.com/thunlp/VERNet.

2020

pdf bib
面向汉语作为第二语言学习的个性化语法纠错(Personalizing Grammatical Error Correction for Chinese as a Second Language)
Shengsheng Zhang (张生盛) | Guina Pang (庞桂娜) | Liner Yang (杨麟儿) | Chencheng Wang (王辰成) | Yongping Du (杜永萍) | Erhong Yang (杨尔弘) | Yaping Huang (黄雅平)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

语法纠错任务旨在通过自然语言处理技术自动检测并纠正文本中的语序、拼写等语法错误。当前许多针对汉语的语法纠错方法已取得较好的效果,但往往忽略了学习者的个性化特征,如二语等级、母语背景等。因此,本文面向汉语作为第二语言的学习者,提出个性化语法纠错,对不同特征的学习者所犯的错误分别进行纠正,并构建了不同领域汉语学习者的数据集进行实验。实验结果表明,将语法纠错模型适应到学习者的各个领域后,性能得到明显提升。

pdf bib
基于BERT与柱搜索的中文释义生成(Chinese Definition Modeling Based on BERT and Beam Seach)
Qinan Fan (范齐楠) | Cunliang Kong (孔存良) | Liner Yang (杨麟儿) | Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

释义生成任务是指为一个目标词生成相应的释义。前人研究中文释义生成任务时未考虑目标词的上下文,本文首次在中文释义生成任务中使用了目标词的上下文信息,并提出了一个基于BERT与柱搜索的释义生成模型。本文构建了包含上下文的CWN中文数据集用于开展实验,除了BLEU指标之外,还使用语义相似度作为额外的自动评价指标,实验结果显示本文模型在中文CWN数据集和英文Oxford数据集上均有显著提升,人工评价结果也与自动评价结果一致。最后,本文对生成实例进行了深入分析。

pdf bib
汉语学习者依存句法树库构建(Construction of a Treebank of Learner Chinese)
Jialu Shi (师佳璐) | Xinyu Luo (罗昕宇) | Liner Yang (杨麟儿) | Dan Xiao (肖丹) | Zhengsheng Hu (胡正声) | Yijun Wang (王一君) | Jiaxin Yuan (袁佳欣) | Yu Jingsi (余婧思) | Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

汉语学习者依存句法树库为非母语者语料提供依存句法分析,可以支持第二语言教学与研究,也对面向第二语言的句法分析、语法改错等相关研究具有重要意义。然而,现有的汉语学习者依存句法树库数量较少,且在标注方面仍存在一些问题。为此,本文改进依存句法标注规范,搭建在线标注平台,并开展汉语学习者依存句法标注。本文重点介绍了数据选取、标注流程等问题,并对标注结果进行质量分析,探索二语偏误对标注质量与句法分析的影响。

2019

pdf bib
The BLCU System in the BEA 2019 Shared Task
Liner Yang | Chencheng Wang
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper describes the BLCU Group submissions to the Building Educational Applications (BEA) 2019 Shared Task on Grammatical Error Correction (GEC). The task is to detect and correct grammatical errors that occurred in essays. We participate in 2 tracks including the Restricted Track and the Unrestricted Track. Our system is based on a Transformer model architecture. We integrate many effective methods proposed in recent years. Such as, Byte Pair Encoding, model ensemble, checkpoints average and spell checker. We also corrupt the public monolingual data to further improve the performance of the model. On the test data of the BEA 2019 Shared Task, our system yields F0.5 = 58.62 and 59.50, ranking twelfth and fourth respectively.

2016

pdf bib
A Novel Fast Framework for Topic Labeling Based on Similarity-preserved Hashing
Xian-Ling Mao | Yi-Jing Hao | Qiang Zhou | Wen-Qing Yuan | Liner Yang | Heyan Huang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Recently, topic modeling has been widely applied in data mining due to its powerful ability. A common, major challenge in applying such topic models to other tasks is to accurately interpret the meaning of each topic. Topic labeling, as a major interpreting method, has attracted significant attention recently. However, most of previous works only focus on the effectiveness of topic labeling, and less attention has been paid to quickly creating good topic descriptors; meanwhile, it’s hard to assign labels for new emerging topics by using most of existing methods. To solve the problems above, in this paper, we propose a novel fast topic labeling framework that casts the labeling problem as a k-nearest neighbor (KNN) search problem in a probability vector set. Our experimental results show that the proposed sequential interleaving method based on locality sensitive hashing (LSH) technology is efficient in boosting the comparison speed among probability distributions, and the proposed framework can generate meaningful labels to interpret topics, including new emerging topics.