Marisa Campos
2016
CINTIL DependencyBank PREMIUM - A Corpus of Grammatical Dependencies for Portuguese
Rita de Carvalho
|
Andreia Querido
|
Marisa Campos
|
Rita Valadas Pereira
|
João Silva
|
António Branco
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper presents a new linguistic resource for the study and computational processing of Portuguese. CINTIL DependencyBank PREMIUM is a corpus of Portuguese news text, accurately manually annotated with a wide range of linguistic information (morpho-syntax, named-entities, syntactic function and semantic roles), making it an invaluable resource specially for the development and evaluation of data-driven natural language processing tools. The corpus is under active development, reaching 4,000 sentences in its current version. The paper also reports on the training and evaluation of a dependency parser over this corpus. CINTIL DependencyBank PREMIUM is freely-available for research purposes through META-SHARE.