Joint Grammar and Treebank Development for Mandarin Chinese with HPSG

Yi Zhang, Rui Wang, Yu Chen


Abstract
We present the ongoing development of MCG, a linguistically deep and precise grammar for Mandarin Chinese together with its accompanying treebank, both based on the linguistic framework of HPSG, and using MRS as the semantic representation. We highlight some key features of our grammar design, and review a number of challenging phenomena, with comparisons to alternative linguistic treatments and implementations. One of the distinguishing characteristics of our approach is the tight integration of grammar and treebank development. The two-step treebank annotation procedure benefits from the efficiency of the discriminant-based annotation approach, while giving the annotators full freedom of producing extra-grammatical structures. This not only allows the creation of a precise and full-coverage treebank with an imperfect grammar, but also provides prompt feedback for grammarians to identify the errors in the grammar design and implementation. Preliminary evaluation and error analysis shows that the grammar already covers most of the core phenomena for Mandarin Chinese, and the treebank annotation procedure reaches a stable speed of 35 sentences per hour with satisfying quality.
Anthology ID:
L12-1167
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1868–1873
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/345_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Yi Zhang, Rui Wang, and Yu Chen. 2012. Joint Grammar and Treebank Development for Mandarin Chinese with HPSG. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1868–1873, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Joint Grammar and Treebank Development for Mandarin Chinese with HPSG (Zhang et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/345_Paper.pdf