Maria Andreevna Petrova


2024

pdf bib
CoBaLD Annotation: The Enrichment of the Enhanced Universal Dependencies with the Semantical Pattern
Maria Andreevna Petrova | Alexandra M. Ivoylova | Anastasia Tishchenkova
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The paper is devoted to the annotation format aimed at morphological, syntactic and especially semantic markup. The format combines the Enhanced UD morphosyntax and the Compreno semantic pattern, enriching the UD annotation with word meanings and labels for semantic relations between words. To adapt the Compreno semantics for the current purpose, we reduced the number of the semantic fields denoting lexical meanings by using hyperonym fields. Moreover, we used a generalized variant of the semantic relations as the original roles possess rather narrow meanings which makes them too numerous. Creating such a format demands the Compreno-to-UD morphosyntax conversion as well, which, in turn, demands solving the asymmetry problem between the models. The asymmetry concerns tokenization, lemmatization, POS-tagging, sets of grammatical features and dependency heads. To overcome this problem, the Compreno-to-UD converter was created. As an application, the work presents a 150,000 token corpus of English news annotated according to the standard.