2022
pdf
bib
abs
Aligning the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs
Ana-Maria Barbu
|
Verginica Barbu Mititelu
|
Cătălin Mititelu
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We present here the efforts of aligning two language resources for Romanian: the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs: for each occurrence of those verbs in the treebank that were included as entries in the lexicon, a set of valence frames is automatically assigned, then manually validated by two linguists and, when necessary, corrected. Validating a valence frame also means semantically disambiguating the verb in the respective context. The validation is done by two linguists, on complementary datasets. However, a subset of verbs were validated by both annotators and Cohen’s κ is 0.87 for this subset. The alignment we have made also serves as a method of enhancing the quality of the two resources, as in the process we identify morpho-syntactic annotation mistakes, incomplete valence frames or missing ones. Information from each resource complements the information from the other, thus their value increases. The treebank and the lexicon are freely available, while the links discovered between them are also made available on GitHub.
2017
pdf
bib
Syntactic Semantic Correspondence in Dependency Grammar
Cătălina Mărănduc
|
Cătălin Mititelu
|
Victoria Bobicev
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories
pdf
bib
abs
A Multiform Balanced Dependency Treebank for Romanian
Mihaela Colhon
|
Cătălina Mărănduc
|
Cătălin Mititelu
Proceedings of the Workshop Knowledge Resources for the Socio-Economic Sciences and Humanities associated with RANLP 2017
The UAIC-RoDia-DepTb is a balanced treebank, containing texts in non-standard language: 2,575 chats sentences, old Romanian texts (a Gospel printed in 1648, a codex of laws printed in 1818, a novel written in 1910), regional popular poetry, legal texts, Romanian and foreign fiction, quotations. The proportions are comparable; each of these types of texts is represented by subsets of at least 1,000 phrases, so that the parser can be trained on their peculiarities. The annotation of the treebank started in 2007, and it has classical tags, such as those in school grammar, with the intention of using the resource for didactic purposes. The classification of circumstantial modifiers is rich in semantic information. We present in this paper the development in progress of this resource which has been automatically annotated and entirely manually corrected. We try to add new texts, and to make it available in more formats, by keeping all the morphological and syntactic information annotated, and adding logical-semantic information. We will describe here two conversions, from the classic syntactic format into Universal Dependencies format and into a logical-semantic layer, which will be shortly presented.
2014
pdf
bib
WordFinder
Catalin Mititelu
|
Verginica Barbu Mititelu
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)