Dealing with unknown words by simple decomposition: feasibility studies with Italian prefixes.

Bruno Cartoni


Abstract
In this article, we present an experiment that aims to evaluate the feasibility of a superficial morphological analysis, to analyse unknown constructed neologisms. For any morphosyntactic analyser, lexical incompleteness is a real problem. This lack of information is partly due to lexical creativity, and more especially to the productivity of some morphological processes. We present here a set of word formation rules based on constructional morphology principles that can be used to improve the performance of an Italian morphosyntactic analyser. These rules use only simple computing techniques in order to ensure efficiency because any improvements in coverage must not slow down the entire system. In the second part of this paper, we describe a method for constraining the rules, and an evaluation of these constraints in terms of performance. Great improvements are achieved in reducing the number of incorrect analyses of unknown neologisms (“noise”), although this is at the cost of some increase in “silence” (correct analyses which are no longer produced). This classic trade-off between “noise” and “silence”, however, can hardly be avoided and we believe that this experiment successfully demonstrates the feasibility of superficial analysis in improving performance and points the way to other avenues of research.
Anthology ID:
L06-1097
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/182_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Bruno Cartoni. 2006. Dealing with unknown words by simple decomposition: feasibility studies with Italian prefixes.. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Dealing with unknown words by simple decomposition: feasibility studies with Italian prefixes. (Cartoni, LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/182_pdf.pdf