Detection of Peculiar Word Sense by Distance Metric Learning with Labeled Examples

Minoru Sasaki, Hiroyuki Shinnou


Abstract
For natural language processing on machines, resolving such peculiar usages would be particularly useful in constructing a dictionary and dataset for word sense disambiguation. Hence, it is necessary to develop a method to detect such peculiar examples of a target word from a corpus. Note that, hereinafter, we define a peculiar example as an instance in which the target word or phrase has a new meaning. In this paper, we proposed a new peculiar example detection method using distance metric learning from labeled example pairs. In this method, first, distance metric learning is performed by large margin nearest neighbor classification for the training data, and new training data points are generated using the distance metric in the original space. Then, peculiar examples are extracted using the local outlier factor, which is a density-based outlier detection method, from the updated training and test data. The efficiency of the proposed method was evaluated on an artificial dataset and the Semeval-2010 Japanese WSD task dataset. The results showed that the proposed method has the highest number of properly detected instances and the highest F-measure value. This shows that the label information of training data is effective for density-based peculiar example detection. Moreover, an experiment on outlier detection using a classification method such as SVM showed that it is difficult to apply the classification method to outlier detection.
Anthology ID:
L12-1327
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
601–604
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/578_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Minoru Sasaki and Hiroyuki Shinnou. 2012. Detection of Peculiar Word Sense by Distance Metric Learning with Labeled Examples. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 601–604, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Detection of Peculiar Word Sense by Distance Metric Learning with Labeled Examples (Sasaki & Shinnou, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/578_Paper.pdf