Automatic extraction of subcategorization frames for French

Paula Chesley, Susanne Salmon-Alt


Abstract
This paper describes the automatic extraction of French subcategorization frames from corpora. The subcategorization frames have been acquired via VISL, a dependency-based parser (Bick 2003), whose verb lexicon is currently incomplete with respect to subcategorization frames. Therefore, we have implemented binomial hypothesis testing as a post-parsing filtering step. On a test set of 104 frequent verbs we achieve lower bounds on type precision at 86.8% and on token recall at 54.3%. These results show that, contra (Korhonen et al. 2000), binomial hypothesis testing can be robust for determining subcategorization frames given corpus data. Additionally, we estimate that our extracted subcategorization frames account for 85.4% of all frames in French corpora. We conclude that using a language resource, such as the VISL parser, with a currently unevaluated (and potentially high) error rate can yield robust results in conjunction with probabilistic filtering of the resource output.
Anthology ID:
L06-1052
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/101_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Paula Chesley and Susanne Salmon-Alt. 2006. Automatic extraction of subcategorization frames for French. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Automatic extraction of subcategorization frames for French (Chesley & Salmon-Alt, LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/101_pdf.pdf