A Factored Functional Dependency Transformation of the English Penn Treebank for Probabilistic Surface Generation

Irene Langkilde-Geary, Justin Betteridge


Abstract
This paper describes a featurized functional dependency corpus automatically derived from the Penn Treebank. Each word in the corpus is associated with over three dozen features describing the functional syntactic structure of a sentence as well as some shallow morphology. The corpus was created for use in probabilistic surface generation, but could also be useful as a resource for the study of English and the development of other NLP applications.
Anthology ID:
L06-1256
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/435_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Irene Langkilde-Geary and Justin Betteridge. 2006. A Factored Functional Dependency Transformation of the English Penn Treebank for Probabilistic Surface Generation. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
A Factored Functional Dependency Transformation of the English Penn Treebank for Probabilistic Surface Generation (Langkilde-Geary & Betteridge, LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/435_pdf.pdf