2024
pdf
bib
abs
Classifying Multi–Word Expressions in the Latvian Monolingual Electronic Dictionary Tēzaurs.lv
Laura Rituma
|
Gunta Nešpore-Bērzkalne
|
Agute Klints
|
Ilze Lokmane
|
Madara Stāde
|
Pēteris Paikens
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)
The electronic dictionary Tēzaurs.lv contains more than 400,000 entries from which 73,000 entries are multi-word expressions (MWEs). Over the past two years, there has been an ongoing division of these MWEs into subgroups (proper names, multi-word terms, taxa, phraseological units, collocations). The article describes the classification of MWEs, focusing on phraseological units (approximately 7,250 entries), as well as on borderline cases of phraseological unit types (phrasemes and idioms) and different MWE groups in general. The division of phraseological units depends on semantic divisibility and figurativeness. In a phraseme, at least one of the constituents retains its literal sense, whereas the meaning of an idiom is not dependent on the literal sense of any of its constituents. As a result, 65919 entries of MWE have been manually classified, and now this information of MWE type is available for the users of the electronic dictionary Tēzaurs.lv.
2021
pdf
bib
abs
Domain Expert Platform for Goal-Oriented Dialog Collection
Didzis Goško
|
Arturs Znotins
|
Inguna Skadina
|
Normunds Gruzitis
|
Gunta Nešpore-Bērzkalne
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Today, most dialogue systems are fully or partly built using neural network architectures. A crucial prerequisite for the creation of a goal-oriented neural network dialogue system is a dataset that represents typical dialogue scenarios and includes various semantic annotations, e.g. intents, slots and dialogue actions, that are necessary for training a particular neural network architecture. In this demonstration paper, we present an easy to use interface and its back-end which is oriented to domain experts for the collection of goal-oriented dialogue samples. The platform not only allows to collect or write sample dialogues in a structured way, but also provides a means for simple annotation and interpretation of the dialogues. The platform itself is language-independent; it depends only on the availability of particular language processing components for a specific language. It is currently being used to collect dialogue samples in Latvian (a highly inflected language) which represent typical communication between students and the student service.
2020
pdf
bib
abs
Deriving a PropBank Corpus from Parallel FrameNet and UD Corpora
Normunds Gruzitis
|
Roberts Darģis
|
Laura Rituma
|
Gunta Nešpore-Bērzkalne
|
Baiba Saulite
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
We propose an approach for generating an accurate and consistent PropBank-annotated corpus, given a FrameNet-annotated corpus which has an underlying dependency annotation layer, namely, a parallel Universal Dependencies (UD) treebank. The PropBank annotation layer of such a multi-layer corpus can be semi-automatically derived from the existing FrameNet and UD annotation layers, by providing a mapping configuration from lexical units in [a non-English language] FrameNet to [English language] PropBank predicates, and a mapping configuration from FrameNet frame elements to PropBank semantic arguments for the given pair of a FrameNet frame and a PropBank predicate. The latter mapping generally depends on the underlying UD syntactic relations. To demonstrate our approach, we use Latvian FrameNet, annotated on top of Latvian UD Treebank, for generating Latvian PropBank in compliance with the Universal Propositions approach.