Giedrė Valūnaitė Oleškevičienė

Also published as: Giedre Valunaite Oleskeviciene, Giedre Valunaite Oleskeviciene, Giedrė Valūnaitė Oleškevičienė, Giedrė Valūnaitė-Oleškevičienė


2024

pdf bib
LLODIA: A Linguistic Linked Open Data Model for Diachronic Analysis
Florentina Armaselu | Chaya Liebeskind | Paola Marongiu | Barbara McGillivray | Giedre Valunaite Oleskeviciene | Elena-Simona Apostol | Ciprian-Octavian Truica | Daniela Gifu
Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024

This article proposes a linguistic linked open data model for diachronic analysis (LLODIA) that combines data derived from diachronic analysis of multilingual corpora with dictionary-based evidence. A humanities use case was devised as a proof of concept that includes examples in five languages (French, Hebrew, Latin, Lithuanian and Romanian) related to various meanings of the term “revolution” considered at different time intervals. The examples were compiled through diachronic word embedding and dictionary alignment.

pdf bib
Proceedings of the Workshop on Deep Learning and Linked Data (DLnLD) @ LREC-COLING 2024
Gilles Sérasset | Hugo Gonçalo Oliveira | Giedre Valunaite Oleskeviciene
Proceedings of the Workshop on Deep Learning and Linked Data (DLnLD) @ LREC-COLING 2024

pdf bib
Self-Evaluation of Generative AI Prompts for Linguistic Linked Open Data Modelling in Diachronic Analysis
Florentina Armaselu | Chaya Liebeskind | Giedre Valunaite Oleskeviciene
Proceedings of the Workshop on Deep Learning and Linked Data (DLnLD) @ LREC-COLING 2024

This article addresses the question of evaluating generative AI prompts designed for specific tasks such as linguistic linked open data modelling and refining of word embedding results. The prompts were created to assist the pre-modelling phase in the construction of LLODIA, a linguistic linked open data model for diachronic analysis. We present a self-evaluation framework based on the method known in literature as LLM-Eval. The discussion includes prompts related to the RDF-XML conception of the model, and neighbour list refinement, dictionary alignment and contextualisation for the term revolution in French, Hebrew and Lithuanian, as a proof of concept.

pdf bib
Multiple Discourse Relations in English TED Talks and Their Translation into Lithuanian, Portuguese and Turkish
Deniz Zeyrek | Giedrė Valūnaitė Oleškevičienė | Amalia Mendes
Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024

pdf bib
From Linguistic Linked Data to Big Data
Dimitar Trajanov | Elena Apostol | Radovan Garabik | Katerina Gkirtzou | Dagmar Gromann | Chaya Liebeskind | Cosimo Palma | Michael Rosner | Alexia Sampri | Gilles Sérasset | Blerina Spahiu | Ciprian-Octavian Truică | Giedre Valunaite Oleskeviciene
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

With advances in the field of Linked (Open) Data (LOD), language data on the LOD cloud has grown in number, size, and variety. With an increased volume and variety of language data, optimizations of methods for distributing, storing, and querying these data become more central. To this end, this position paper investigates use cases at the intersection of LLOD and Big Data, existing approaches to utilizing Big Data techniques within the context of linked data, and discusses the challenges and benefits of this union.

pdf bib
MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
Dagmar Gromann | Hugo Goncalo Oliveira | Lucia Pitarch | Elena-Simona Apostol | Jordi Bernad | Eliot Bytyçi | Chiara Cantone | Sara Carvalho | Francesca Frontini | Radovan Garabik | Jorge Gracia | Letizia Granata | Fahad Khan | Timotej Knez | Penny Labropoulou | Chaya Liebeskind | Maria Pia Di Buono | Ana Ostroški Anić | Sigita Rackevičienė | Ricardo Rodrigues | Gilles Sérasset | Linas Selmistraitis | Mahammadou Sidibé | Purificação Silvano | Blerina Spahiu | Enriketa Sogutlu | Ranka Stanković | Ciprian-Octavian Truică | Giedre Valunaite Oleskeviciene | Slavko Zitnik | Katerina Zdravkova
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.

2023

pdf bib
Towards a Conversational Web? A Benchmark for Analysing Semantic Change with Conversational Knowledge Bots and Linked Open Data
Florentina Armaselu | Elena-Simona Apostol | Christian Chiarcos | Anas Fahad Khan | Chaya Liebeskind | Barbara McGillivray | Ciprian-Octavian Truica | Andrius Utka | Giedrė Valūnaitė-Oleškevičienė
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
Adopting Linguistic Linked Data Principles: Insights on Users’ Experience
Verginica Mititelu | Maria Pia Di Buono | Hugo Gonçalo Oliveira | Blerina Spahiu | Giedrė Valūnaitė-Oleškevičienė
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
Validation of the Bigger Analogy Test Set Translation into Croatian, Lithuanian and Slovak
Radovan Garabík | Ana Ostroški Anić | Sigita Rackevičienė | Giedrė Valūnaitė-Oleškevičienė | Linas Selmistraitis | Andrius Utka
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
Workflow Reversal and Data Wrangling in Multilingual Diachronic Analysis and Linguistic Linked Open Data Modelling
Florentina Armaselu | Barbara McGillivray | Chaya Liebeskind | Giedrė Valūnaitė Oleškevičienė | Andrius Utka | Daniela Gifu | Anas Fahad Khan | Elena-Simona Apostol | Ciprian-Octavian Truica
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
Validation of Language Agnostic Models for Discourse Marker Detection
Mariana Damova | Kostadin Mishev | Giedrė Valūnaitė-Oleškevičienė | Chaya Liebeskind | Purificação Silvano | Dimitar Trajanov | Ciprian-Octavian Truica | Elena-Simona Apostol | Christian Chiarcos | Anna Baczkowska
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
Multi-word Expressions as Discourse Markers in Multilingual TED-ELH Parallel Corpus
Giedrė Valūnaitė-Oleškevičienė | Chaya Liebeskind
Proceedings of the 4th Conference on Language, Data and Knowledge

2022

pdf bib
A Survey of Guidelines and Best Practices for the Generation, Interlinking, Publication, and Validation of Linguistic Linked Data
Fahad Khan | Christian Chiarcos | Thierry Declerck | Maria Pia Di Buono | Milan Dojchinovski | Jorge Gracia | Giedre Valunaite Oleskeviciene | Daniela Gifu
Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference

This article discusses a survey carried out within the NexusLinguarum COST Action which aimed to give an overview of existing guidelines (GLs) and best practices (BPs) in linguistic linked data. In particular it focused on four core tasks in the production/publication of linked data: generation, interlinking, publication, and validation. We discuss the importance of GLs and BPs for LLD before describing the survey and its results in full. Finally we offer a number of directions for future work in order to address the findings of the survey.

pdf bib
Cross-Lingual Link Discovery for Under-Resourced Languages
Michael Rosner | Sina Ahmadi | Elena-Simona Apostol | Julia Bosque-Gil | Christian Chiarcos | Milan Dojchinovski | Katerina Gkirtzou | Jorge Gracia | Dagmar Gromann | Chaya Liebeskind | Giedrė Valūnaitė Oleškevičienė | Gilles Sérasset | Ciprian-Octavian Truică
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We rst introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We de ne under-resourced languages with a speci c focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources.

pdf bib
ISO-based Annotated Multilingual Parallel Corpus for Discourse Markers
Purificação Silvano | Mariana Damova | Giedrė Valūnaitė Oleškevičienė | Chaya Liebeskind | Christian Chiarcos | Dimitar Trajanov | Ciprian-Octavian Truică | Elena-Simona Apostol | Anna Baczkowska
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Discourse markers carry information about the discourse structure and organization, and also signal local dependencies or epistemological stance of speaker. They provide instructions on how to interpret the discourse, and their study is paramount to understand the mechanism underlying discourse organization. This paper presents a new language resource, an ISO-based annotated multilingual parallel corpus for discourse markers. The corpus comprises nine languages, Bulgarian, Lithuanian, German, European Portuguese, Hebrew, Romanian, Polish, and Macedonian, with English as a pivot language. In order to represent the meaning of the discourse markers, we propose an annotation scheme of discourse relations from ISO 24617-8 with a plug-in to ISO 24617-2 for communicative functions. We describe an experiment in which we applied the annotation scheme to assess its validity. The results reveal that, although some extensions are required to cover all the multilingual data, it provides a proper representation of discourse markers value. Additionally, we report some relevant contrastive phenomena concerning discourse markers interpretation and role in discourse. This first step will allow us to develop deep learning methods to identify and extract discourse relations and communicative functions, and to represent that information as Linguistic Linked Open Data (LLOD).

pdf bib
Morphological Complexity of Children Narratives in Eight Languages
Gordana Hržica | Chaya Liebeskind | Kristina Š. Despot | Olga Dontcheva-Navratilova | Laura Kamandulytė-Merfeldienė | Sara Košutar | Matea Kramarić | Giedrė Valūnaitė Oleškevičienė
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The aim of this study was to compare the morphological complexity in a corpus representing the language production of younger and older children across different languages. The language samples were taken from the Frog Story subcorpus of the CHILDES corpora, which comprises oral narratives collected by various researchers between 1990 and 2005. We extracted narratives by typically developing, monolingual, middle-class children. Additionally, samples of Lithuanian language, collected according to the same principles, were added. The corpus comprises 249 narratives evenly distributed across eight languages: Croatian, English, French, German, Italian, Lithuanian, Russian and Spanish. Two subcorpora were formed for each language: a younger children corpus and an older children corpus. Four measures of morphological complexity were calculated for each subcorpus: Bane, Kolmogorov, Word entropy and Relative entropy of word structure. The results showed that younger children corpora had lower morphological complexity than older children corpora for all four measures for Spanish and Russian. Reversed results were obtained for English and French, and the results for the remaining four languages showed variation. Relative entropy of word structure proved to be indicative of age differences. Word entropy and relative entropy of word structure show potential to demonstrate typological differences.

2021

pdf bib
Multiword expressions as discourse markers in Hebrew and Lithuanian
Giedre Valunaite Oleskeviciene | Chaya Liebeskind
Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age