Automatic Acquisition for low frequency lexical items

Núria Bel, Sergio Espeja, Montserrat Marimon


Abstract
This paper addresses a specific case of the task of lexical acquisition understood as the induction of information about the linguistic characteristics of lexical items on the basis of information gathered from their occurrences in texts. Most of the recent works in the area of lexical acquisition have used methods that take as much textual data as possible as source of evidence, but their performance decreases notably when only few occurrences of a word are available. The importance of covering such low frequency items lies in the fact that a large quantity of the words in any particular collection of texts will be occurring few times, if not just once. Our work proposes to compensate the lack of information resorting to linguistic knowledge on the characteristics of lexical classes. This knowledge, obtained from a lexical typology, is formulated probabilistically to be used in a Bayesian method to maximize the information gathered from single occurrences as to predict the full set of characteristics of the word. Our results show that our method achieves better results than others for the treatment of low frequency items.
Anthology ID:
L08-1099
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/334_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Núria Bel, Sergio Espeja, and Montserrat Marimon. 2008. Automatic Acquisition for low frequency lexical items. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Automatic Acquisition for low frequency lexical items (Bel et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/334_paper.pdf