Automatic Acquisition of Chinese Novel Noun Compounds

Meng Wang; Chu-Ren Huang; Shiwen Yu; Weiwei Sun

Automatic Acquisition of Chinese Novel Noun Compounds

Meng Wang, Chu-Ren Huang, Shiwen Yu, Weiwei Sun

Abstract

Automatic acquisition of novel compounds is notoriously difficult because most novel compounds have relatively low frequency in a corpus. The current study proposes a new method to deal with the novel compound acquisition challenge. We model this task as a two-class classification problem in which a candidate compound is either classified as a compound or a non-compound. A machine learning method using SVM, incorporating two types of linguistically motivated features: semantic features and character features, is applied to identify rare but valid noun compounds. We explore two kinds of training data: one is virtual training data which is obtained by three statistical scores, i.e. co-occurrence frequency, mutual information and dependent ratio, from the frequent compounds; the other is real training data which is randomly selected from the infrequent compounds. We conduct comparative experiments, and the experimental results show that even with limited direct evidence in the corpus for the novel compounds, we can make full use of the typical frequent compounds to help in the discovery of the novel compounds.

Anthology ID:: L10-1257
Volume:: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:: May
Year:: 2010
Address:: Valletta, Malta
Editors:: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2010/pdf/377_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Meng Wang, Chu-Ren Huang, Shiwen Yu, and Weiwei Sun. 2010. Automatic Acquisition of Chinese Novel Noun Compounds. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):: Automatic Acquisition of Chinese Novel Noun Compounds (Wang et al., LREC 2010)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2010/pdf/377_Paper.pdf

PDF Cite Search Fix data