Kuang-hua Chen

Also published as: Kuang-Hua Chen


2016

The appropriate use of collocations is a challenge for second language acquisition. However, high quality and easily accessible Chinese collocation resources are not available for both teachers and students. This paper presents the design and construction of a large scale resource of Chinese collocational knowledge, and a web-based application (OCCA, Online Chinese Collocation Assistant) which offers free and convenient collocation search service to end users. We define and classify collocations based on practical language acquisition needs and utilize a syntax based method to extract nine types of collocations. Totally 37 extraction rules are compiled with word, POS and dependency relation features, 1,750,000 collocations are extracted from a corpus for L2 learning and complementary Wikipedia data, and OCCA is implemented based on these extracted collocations. By comparing OCCA with two traditional collocation dictionaries, we find OCCA has higher entry coverage and collocation quantity, and our method achieves quite low error rate at less than 5%. We also discuss how to apply collocational knowledge to grammatical error detection and demonstrate comparable performance to the best results in 2015 NLP-TEA CGED shared task. The preliminary experiment shows that the collocation knowledge is helpful in detecting all the four types of grammatical errors.

2014

2013

2012

2011

2010

2009

2007

2000

1999

1998

Automatic summarization and information extraction are two important Internet services. MUC and SUMMAC play their appropriate roles in the next generation Internet. This paper focuses on the automatic summarization and proposes two different models to extract sentences for summary generation under two tasks initiated by SUMMAC-1. For categorization task, positive feature vectors and negative feature vectors are used cooperatively to construct generic, indicative summaries. For adhoc task, a text model based on relationship between nouns and verbs is used to filter out irrelevant discourse segment, to rank relevant sentences, and to generate the user-directed summaries. The result shows that the NormF of the best summary and that of the fixed summary for adhoc tasks are 0.456 and 0.447. The NormF of the best summary and that of the fixed summary for categorization task are 0.4090 and 0.4023. Our system outperforms the average system in categorization task but does a common job in adhoc task.

1996

1995

1994

1993