2023
pdf
bib
abs
Named Entity layer in Estonian UD treebanks
Kadri Muischnek
|
Kaili Müürisep
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
In this paper we will introduce two new language resources, two NE-annotated corpora for Estonian: Estonian Universal Dependencies Treebank (EDT, 440,000 tokens) and Estonian Universal Dependencies Web Treebank (EWT, 90,000 tokens). Together they make up the largest publicly available Estonian named entity gold annotation dataset. Eight NE categories are manually annotated in this dataset, and the fact that it is also annotated for lemma, POS, morphological features and dependency syntactic relations, makes it more valuable. We will also show that dividing the set of named entities into clear-cut categories is not always easy.
2017
pdf
bib
Estonian Copular and Existential Constructions as an UD Annotation Problem
Kadri Muischnek
|
Kaili Müürisep
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)
2016
pdf
bib
abs
Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies
Kadri Muischnek
|
Kaili Müürisep
|
Tiina Puolakainen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper presents the first version of Estonian Universal Dependencies Treebank which has been semi-automatically acquired from Estonian Dependency Treebank and comprises ca 400,000 words (ca 30,000 sentences) representing the genres of fiction, newspapers and scientific writing. Article analyses the differences between two annotation schemes and the conversion procedure to Universal Dependencies format. The conversion has been conducted by manually created Constraint Grammar transfer rules. As the rules enable to consider unbounded context, include lexical information and both flat and tree structure features at the same time, the method has proved to be reliable and flexible enough to handle most of transformations. The automatic conversion procedure achieved LAS 95.2%, UAS 96.3% and LA 98.4%. If punctuation marks were excluded from the calculations, we observed LAS 96.4%, UAS 97.7% and LA 98.2%. Still the refinement of the guidelines and methodology is needed in order to re-annotate some syntactic phenomena, e.g. inter-clausal relations. Although automatic rules usually make quite a good guess even in obscure conditions, some relations should be checked and annotated manually after the main conversion.
2001
pdf
bib
Parsing Estonian with Constraint Grammar
Kaili Müürisep
Proceedings of the 13th Nordic Conference of Computational Linguistics (NODALIDA 2001)
1999
pdf
bib
Determination of Syntactic Functions in Estonian Constraint Grammar
Kaili Müürisep
Ninth Conference of the European Chapter of the Association for Computational Linguistics