BCCWJ-DepPara: A Syntactic Annotation Treebank on the ‘Balanced Corpus of Contemporary Written Japanese’

Masayuki Asahara, Yuji Matsumoto


Abstract
Paratactic syntactic structures are difficult to represent in syntactic dependency tree structures. As such, we propose an annotation schema for syntactic dependency annotation of Japanese, in which coordinate structures are split from and overlaid on bunsetsu-based (base phrase unit) dependency. The schema represents nested coordinate structures, non-constituent conjuncts, and forward sharing as the set of regions. The annotation was performed on the core data of ‘Balanced Corpus of Contemporary Written Japanese’, which comprised about one million words and 1980 samples from six registers, such as newspapers, books, magazines, and web texts.
Anthology ID:
W16-5406
Volume:
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Koiti Hasida, Kam-Fai Wong, Nicoletta Calzorari, Key-Sun Choi
Venue:
ALR
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
49–58
Language:
URL:
https://aclanthology.org/W16-5406
DOI:
Bibkey:
Cite (ACL):
Masayuki Asahara and Yuji Matsumoto. 2016. BCCWJ-DepPara: A Syntactic Annotation Treebank on the ‘Balanced Corpus of Contemporary Written Japanese’. In Proceedings of the 12th Workshop on Asian Language Resources (ALR12), pages 49–58, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
BCCWJ-DepPara: A Syntactic Annotation Treebank on the ‘Balanced Corpus of Contemporary Written Japanese’ (Asahara & Matsumoto, ALR 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-5406.pdf