Renfen Hu


pdf bib
古汉语通假字资源库的构建及应用研究(The Construction and Application of an Ancient Chinese Language Resource on Tongjiazi)
Zhaoji Wang (王兆基) | Shirui Zhang (张诗睿) | Xuetao Zhang (张学涛) | Renfen Hu (胡韧奋)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics


pdf bib
CCL23-Eval 任务7总结报告: 汉语学习者文本纠错(Overview of CCL23-Eval Task: Chinese Learner Text Correction)
Hongxiang Chang | Yang Liu | Meng Xu | Yingying Wang | Cunliang Kong | Liner Yang | Yang Erhong | Maosong Sun | Gaoqi Rao | Renfen Hu | Zhenghao Liu | 鸿翔 常 | 洋 刘 | 萌 徐 | 莹莹 王 | 存良 孔 | 麟儿 杨 | 尔弘 杨 | 茂松 孙 | 高琦 饶 | 韧奋 胡 | 正皓 刘
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“汉语学习者文本纠错(Chinese Learner Text Correction)评测比赛,是依托于第22届中国计算语言学大会举办的技术评测。针对汉语学习者文本,设置了多维度汉语学习者文本纠错和中文语法错误检测两个赛道。结合人工智能技术的不断进步和发展的时代背景,在两赛道下分别设置开放和封闭任务。开放任务允许使用大模型。以汉语学习者文本多维标注语料库YACLC为基础建设评测数据集,建立基于多参考答案的评价标准,构建基准评测框架,进一步推动汉语学习者文本纠错研究的发展。共38支队伍报名参赛,其中5支队伍成绩优异并提交了技术报告。”


pdf bib
古汉语词义标注语料库的构建及应用研究(The Construction and Application of Ancient Chinese Corpus with Word Sense Annotation)
Lei Shu (舒蕾) | Yiluan Guo (郭懿鸾) | Huiping Wang (王慧萍) | Xuetao Zhang (张学涛) | Renfen Hu (胡韧奋)
Proceedings of the 20th Chinese National Conference on Computational Linguistics



pdf bib
An Intelligent Testing Strategy for Vocabulary Assessment of Chinese Second Language Learners
Wei Zhou | Renfen Hu | Feipeng Sun | Ronghuai Huang
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Vocabulary is one of the most important parts of language competence. Testing of vocabulary knowledge is central to research on reading and language. However, it usually costs a large amount of time and human labor to build an item bank and to test large number of students. In this paper, we propose a novel testing strategy by combining automatic item generation (AIG) and computerized adaptive testing (CAT) in vocabulary assessment for Chinese L2 learners. Firstly, we generate three types of vocabulary questions by modeling both the vocabulary knowledge and learners’ writing error data. After evaluation and calibration, we construct a balanced item pool with automatically generated items, and implement a three-parameter computerized adaptive test. We conduct manual item evaluation and online student tests in the experiments. The results show that the combination of AIG and CAT can construct test items efficiently and reduce test cost significantly. Also, the test result of CAT can provide valuable feedback to AIG algorithms.

pdf bib
Diachronic Sense Modeling with Deep Contextualized Word Embeddings: An Ecological View
Renfen Hu | Shen Li | Shichen Liang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Diachronic word embeddings have been widely used in detecting temporal changes. However, existing methods face the meaning conflation deficiency by representing a word as a single vector at each time period. To address this issue, this paper proposes a sense representation and tracking framework based on deep contextualized embeddings, aiming at answering not only what and when, but also how the word meaning changes. The experiments show that our framework is effective in representing fine-grained word senses, and it brings a significant improvement in word change detection task. Furthermore, we model the word change from an ecological viewpoint, and sketch two interesting sense behaviors in the process of language evolution, i.e. sense competition and sense cooperation.


pdf bib
From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information
Hengru Xu | Shen Li | Renfen Hu | Si Li | Sheng Gao
Proceedings of the 22nd Conference on Computational Natural Language Learning

Dropout is used to avoid overfitting by randomly dropping units from the neural networks during training. Inspired by dropout, this paper presents GI-Dropout, a novel dropout method integrating with global information to improve neural networks for text classification. Unlike the traditional dropout method in which the units are dropped randomly according to the same probability, we aim to use explicit instructions based on global information of the dataset to guide the training process. With GI-Dropout, the model is supposed to pay more attention to inapparent features or patterns. Experiments demonstrate the effectiveness of the dropout with global information on seven text classification tasks, including sentiment analysis and topic classification.

pdf bib
Analogical Reasoning on Chinese Morphological and Semantic Relations
Shen Li | Zhe Zhao | Renfen Hu | Wensi Li | Tao Liu | Xiaoyong Du
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Analogical reasoning is effective in capturing linguistic regularities. This paper proposes an analogical reasoning task on Chinese. After delving into Chinese lexical knowledge, we sketch 68 implicit morphological relations and 28 explicit semantic relations. A big and balanced dataset CA8 is then built for this task, including 17813 questions. Furthermore, we systematically explore the influences of vector representations, context features, and corpora on analogical reasoning. With the experiments, CA8 is proved to be a reliable benchmark for evaluating Chinese word embeddings.


pdf bib
Initializing Convolutional Filters with Semantic Features for Text Classification
Shen Li | Zhe Zhao | Tao Liu | Renfen Hu | Xiaoyong Du
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Convolutional Neural Networks (CNNs) are widely used in NLP tasks. This paper presents a novel weight initialization method to improve the CNNs for text classification. Instead of randomly initializing the convolutional filters, we encode semantic features into them, which helps the model focus on learning useful features at the beginning of the training. Experiments demonstrate the effectiveness of the initialization technique on seven text classification tasks, including sentiment analysis and topic classification.


pdf bib
The Construction of a Chinese Collocational Knowledge Resource and Its Application for Second Language Acquisition
Renfen Hu | Jiayong Chen | Kuang-hua Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

The appropriate use of collocations is a challenge for second language acquisition. However, high quality and easily accessible Chinese collocation resources are not available for both teachers and students. This paper presents the design and construction of a large scale resource of Chinese collocational knowledge, and a web-based application (OCCA, Online Chinese Collocation Assistant) which offers free and convenient collocation search service to end users. We define and classify collocations based on practical language acquisition needs and utilize a syntax based method to extract nine types of collocations. Totally 37 extraction rules are compiled with word, POS and dependency relation features, 1,750,000 collocations are extracted from a corpus for L2 learning and complementary Wikipedia data, and OCCA is implemented based on these extracted collocations. By comparing OCCA with two traditional collocation dictionaries, we find OCCA has higher entry coverage and collocation quantity, and our method achieves quite low error rate at less than 5%. We also discuss how to apply collocational knowledge to grammatical error detection and demonstrate comparable performance to the best results in 2015 NLP-TEA CGED shared task. The preliminary experiment shows that the collocation knowledge is helpful in detecting all the four types of grammatical errors.


pdf bib
A hybrid system for Chinese-English patent machine translation
Hongzheng Li | Kai Zhao | Renfen Hu | Yun Zhu | Yaohong Jin
Proceedings of the 6th Workshop on Patent and Scientific Literature Translation


pdf bib
Pre-reordering Model of Chinese Special Sentences for Patent Machine Translation
Renfen Hu | Zhiying Liu | Lijiao Yang | Yaohong Jin
Proceedings of the COLING Workshop on Synchronic and Diachronic Approaches to Analyzing Technical Language