2020
pdf
bib
abs
CBOW-tag: a Modified CBOW Algorithm for Generating Embedding Models from Annotated Corpora
Attila Novák
|
László Laki
|
Borbála Novák
Proceedings of the Twelfth Language Resources and Evaluation Conference
In this paper, we present a modified version of the CBOW algorithm implemented in the fastText framework. Our modified algorithm, CBOW-tag builds a vector space model that includes the representation of the original word forms and their annotation at the same time. We illustrate the results by presenting a model built from a corpus that includes morphological and syntactic annotations. The simultaneous presence of unannotated elements and different annotations at the same time in the model makes it possible to constrain nearest neighbour queries to specific types of elements. The model can thus efficiently answer questions such as What do we eat?, What can we do with a skeleton? What else do we do with what we eat?, etc. Error analysis reveals that the model can highlight errors introduced into the annotation by the tagger and parser we used to generate the annotations as well as lexical peculiarities in the corpus itself, especially if we do not limit the vocabulary of the model to frequent items.
2019
pdf
bib
abs
Creation of a corpus with semantic role labels for Hungarian
Attila Novák
|
László Laki
|
Borbála Novák
|
Andrea Dömötör
|
Noémi Ligeti-Nagy
|
Ágnes Kalivoda
Proceedings of the 13th Linguistic Annotation Workshop
In this article, an ongoing research is presented, the immediate goal of which is to create a corpus annotated with semantic role labels for Hungarian that can be used to train a parser-based system capable of formulating relevant questions about the text it processes. We briefly describe the objectives of our research, our efforts at eliminating errors in the Hungarian Universal Dependencies corpus, which we use as the base of our annotation effort, at creating a Hungarian verbal argument database annotated with thematic roles, at classifying adjuncts, and at matching verbal argument frames to specific occurrences of verbs and participles in the corpus.
2014
pdf
bib
abs
An efficient language independent toolkit for complete morphological disambiguation
László Laki
|
György Orosz
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper a Moses SMT toolkit-based language-independent complete morphological annotation tool is presented called HuLaPos2. Our system performs PoS tagging and lemmatization simultaneously. Amongst others, the algorithm used is able to handle phrases instead of unigrams, and can perform the tagging in a not strictly left-to-right order. With utilizing these gains, our system outperforms the HMM-based ones. In order to handle the unknown words, a suffix-tree based guesser was integrated into HuLaPos2. To demonstrate the performance of our system it was compared with several systems in different languages and PoS tag sets. In general, it can be concluded that the quality of HuLaPos2 is comparable with the state-of-the-art systems, and in the case of PoS tagging it outperformed many available systems.
2013
pdf
bib
English to Hungarian Morpheme-based Statistical Machine Translation System with Reordering Rules
László Laki
|
Attila Novák
|
Borbála Siklósi
Proceedings of the Second Workshop on Hybrid Approaches to Translation