Identification of Rare & Novel Senses Using Translations in a Parallel Corpus

Richard Schwarz, Hinrich Schütze, Fabienne Martin, Achim Stein


Abstract
The identification of rare and novel senses is a challenge in lexicography. In this paper, we present a new method for finding such senses using a word aligned multilingual parallel corpus. We use the Europarl corpus and therein concentrate on French verbs. We represent each occurrence of a French verb as a high dimensional term vector. The dimensions of such a vector are the possible translations of the verb according to the underlying word alignment. The dimensions are weighted by a weighting scheme to adjust to the significance of any particular translation. After collecting these vectors we apply forms of the K-means algorithm on the resulting vector space to produce clusters of distinct senses, so that standard uses produce large homogeneous clusters while rare and novel uses appear in small or heterogeneous clusters. We show in a qualitative and quantitative evaluation that the method can successfully find rare and novel senses.
Anthology ID:
L10-1283
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/411_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Richard Schwarz, Hinrich Schütze, Fabienne Martin, and Achim Stein. 2010. Identification of Rare & Novel Senses Using Translations in a Parallel Corpus. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Identification of Rare & Novel Senses Using Translations in a Parallel Corpus (Schwarz et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/411_Paper.pdf