Adding a Syntactic Annotation Level to the Corpus of Contemporary Romanian Language

Andrei Scutelnicu, Catalina Maranduc, Dan Cristea


Abstract
In this paper we present an experiment of augmenting the Corpus of Contemporary Romanian Language (CoRoLa) with the syntactic level of annotations, which would allow users to address queries about the syntax of Romanian sentences, in the Universal Dependency model. After a short introduction of CoRoLa, we describe the treebanks used to train the dependency parser, we show the evaluation results and the process of upgrading CoRoLa with the new level of annotations. The parser displaying the best accuracy with respect to recognition of heads and relations, out of three variants trained on manually built treebanks, was chosen. Keywords: Syntactic annotation, treebank, corpus, maltparser
Anthology ID:
2020.cmlc-1.9
Volume:
Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Piotr Bański, Adrien Barbaresi, Simon Clematide, Marc Kupietz, Harald Lüngen, Ines Pisetta
Venue:
CMLC
SIG:
Publisher:
European Language Ressources Association
Note:
Pages:
58–62
Language:
English
URL:
https://aclanthology.org/2020.cmlc-1.9
DOI:
Bibkey:
Cite (ACL):
Andrei Scutelnicu, Catalina Maranduc, and Dan Cristea. 2020. Adding a Syntactic Annotation Level to the Corpus of Contemporary Romanian Language. In Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora, pages 58–62, Marseille, France. European Language Ressources Association.
Cite (Informal):
Adding a Syntactic Annotation Level to the Corpus of Contemporary Romanian Language (Scutelnicu et al., CMLC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.cmlc-1.9.pdf