Luning Ji
2006
A Study on Terminology Extraction Based on Classified Corpora
Yirong Chen
|
Qin Lu
|
Wenjie Li
|
Zhifang Sui
|
Luning Ji
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Algorithms for automatic term extraction in a specific domain should consider at least two issues, namely Unithood and Termhood (Kageura, 1996). Unithood refers to the degree of a string to occur as a word or a phrase. Termhood (Chen Yirong, 2005) refers to the degree of a word or a phrase to occur as a domain specific concept. Unlike unithood, study on termhood is not yet widely reported. In classified corpora, the class information provides the cue to the nature of data and can be used in termhood calculation. Three algorithms are provided and evaluated to investigate termhood based on classified corpora. The three algorithms are based on lexicon set computing, term frequency and document frequency, and the strength of the relation between a term and its document class respectively. Our objective is to investigate the effects of these different termhood measurement features. After evaluation, we can find which features are more effective and also, how we can improve these different features to achieve the best performance. Preliminary results show that the first measure can effectively filter out independent terms or terms of general use.
A Comparative Study of the Effect of Word Segmentation On Chinese Terminology Extraction
Luning Ji
|
Qin Lu
|
Wenjie Li
|
YiRong Chen
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation
Search