Improved CCG Parsing with Semi-supervised Supertagging

Mike Lewis; Mark Steedman

doi:10.1162/tacl_a_00186

Improved CCG Parsing with Semi-supervised Supertagging

Abstract

Current supervised parsers are limited by the size of their labelled training data, making improving them with unlabelled data an important goal. We show how a state-of-the-art CCG parser can be enhanced, by predicting lexical categories using unsupervised vector-space embeddings of words. The use of word embeddings enables our model to better generalize from the labelled data, and allows us to accurately assign lexical categories without depending on a POS-tagger. Our approach leads to substantial improvements in dependency parsing results over the standard supervised CCG parser when evaluated on Wall Street Journal (0.8%), Wikipedia (1.8%) and biomedical (3.4%) text. We compare the performance of two recently proposed approaches for classification using a wide variety of word embeddings. We also give a detailed error analysis demonstrating where using embeddings outperforms traditional feature sets, and showing how including POS features can decrease accuracy.

Anthology ID:: Q14-1026
Volume:: Transactions of the Association for Computational Linguistics, Volume 2
Month:
Year:: 2014
Address:: Cambridge, MA
Editors:: Dekang Lin, Michael Collins, Lillian Lee
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 327–338
Language:
URL:: https://aclanthology.org/Q14-1026/
DOI:: 10.1162/tacl_a_00186
Bibkey:
Cite (ACL):: Mike Lewis and Mark Steedman. 2014. Improved CCG Parsing with Semi-supervised Supertagging. Transactions of the Association for Computational Linguistics, 2:327–338.
Cite (Informal):: Improved CCG Parsing with Semi-supervised Supertagging (Lewis & Steedman, TACL 2014)
Copy Citation:
PDF:: https://aclanthology.org/Q14-1026.pdf
Data: Penn Treebank

PDF Cite Search Fix data