A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies

Hongzhi Xu, Helen Kaiyun Chen, Chu-Ren Huang, Qin Lu, Dingxu Shi, Tin-Shing Chiu


Abstract
We adopt the corpus-informed approach to example sentence selections for the construction of a reference grammar. In the process, a database containing sentences that are carefully selected by linguistic experts including the full range of linguistic facts covered in an authoritative Chinese Reference Grammar is constructed and structured according to the reference grammar. A search engine system is developed to facilitate the process of finding the most typical examples the users need to study a linguistic problem or prove their hypotheses. The database can also be used as a training corpus by computational linguists to train models for Chinese word segmentation, POS tagging and sentence parsing.
Anthology ID:
L12-1207
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3140–3144
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/401_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Hongzhi Xu, Helen Kaiyun Chen, Chu-Ren Huang, Qin Lu, Dingxu Shi, and Tin-Shing Chiu. 2012. A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3140–3144, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies (Xu et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/401_Paper.pdf