May Lai-Yin Wong


2006

pdf bib
Skeleton Parsing in Chinese: Annotation Scheme and Guidelines
May Lai-Yin Wong
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents my manual skeleton parsing on a sample text of approximately 100,000 word tokens (or about 2,500 sentences) taken from the PFR Chinese Corpus with a clearly defined parsing scheme of 17 constituent labels. The manually-parsed sample skeleton treebank is one of the very few extant Chinese treebanks. While Chinese part-of-speech tagging and word segmentation have been the subject of concerted research for many years, the syntactic annotation of Chinese corpora is a comparatively new field. The difficulties that I encountered in the production of this treebank demonstrate some of the peculiarities of Chinese syntax. A noteworthy syntactic property is that some serial verb constructions tend to be used as if they were compound verbs. The two transitive verbs in series, unlike common transitive verbs, do not take an object separately within the construction; rather, the serial construction as a whole is able to take the same direct object and the perfective aspect marker le. The skeleton-parsed sample treebank is evaluated against Eyes & Leech (1993)’s criteria and proves to be accurate, uniform and linguistically valid.
Search
Co-authors
    Venues