Linguistic Data Retrievable from a Treebank

Verginica Barbu Mititelu, Elena Irimia


Abstract
This paper describes the Romanian treebank annotated according to the Universal Dependency principles. We present the types of texts included in the treebank, their processing phases and the tools used for doing it, as well as the levels of annotation, with a focus on the syntactic level. We briefly present the syntactic formalism used, the principles followed and the set of relations. The perspective we adopted is the linguist’s who searches the treebank for information with relevance for the study of Romanian. (S)He can interpret the statistics based on the corpus and can also query the treebank for finding examples to support a theory, for testing hypothesis or for discovering new tendencies. We use here the passive constructions in Romanian as a case study for showing how statistical data help understanding this linguistic phenomenon. We also discuss the kinds of linguistic information retrievable and non-retrievable form the treebank, based on the annotation principles.
Anthology ID:
2016.clib-1.3
Volume:
Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016)
Month:
September
Year:
2016
Address:
Sofia, Bulgaria
Venue:
CLIB
SIG:
Publisher:
Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences
Note:
Pages:
19–27
Language:
URL:
https://aclanthology.org/2016.clib-1.3
DOI:
Bibkey:
Cite (ACL):
Verginica Barbu Mititelu and Elena Irimia. 2016. Linguistic Data Retrievable from a Treebank. In Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016), pages 19–27, Sofia, Bulgaria. Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences.
Cite (Informal):
Linguistic Data Retrievable from a Treebank (Barbu Mititelu & Irimia, CLIB 2016)
Copy Citation:
PDF:
https://aclanthology.org/2016.clib-1.3.pdf