Špela Vintar

Also published as: Spela Vintar

2024

How Human-Like Are Word Associations in Generative Models? An Experiment in Slovene
Špela Vintar | Mojca Brglez | Aleš Žagar
Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024

Large language models (LLMs) show extraordinary performance in a broad range of cognitive tasks, yet their capability to reproduce human semantic similarity judgements remains disputed. We report an experiment in which we fine-tune two LLMs for Slovene, a monolingual SloT5 and a multilingual mT5, as well as an mT5 for English, to generate word associations. The models are fine-tuned on human word association norms created within the Small World of Words project, which recently started to collect data for Slovene. Since our aim was to explore differences between human and model-generated outputs, the model parameters were minimally adjusted to fit the association task. We perform automatic evaluation using a set of methods to measure the overlap and ranking, and in addition a subset of human and model-generated responses were manually classified into four categories (meaning-, positionand form-based, and erratic). Results show that human-machine overlap is very small, but that the models produce a similar distribution of association categories as humans.

2020

pdf bib abs

The NetViz terminology visualization tool and the use cases in karstology domain modeling
Senja Pollak | Vid Podpečan | Dragana Miljkovic | Uroš Stepišnik | Špela Vintar
Proceedings of the 6th International Workshop on Computational Terminology

We present the NetViz terminology visualization tool and apply it to the domain modeling of karstology, a subfield of geography studying karst phenomena. The developed tool allows for high-performance online network visualization where the user can upload the terminological data in a simple CSV format, define the nodes (terms, categories), edges (relations) and their properties (by assigning different node colors), and then edit and interactively explore domain knowledge in the form of a network. We showcase the usefulness of the tool on examples from the karstology domain, where in the first use case we visualize the domain knowledge as represented in a manually annotated corpus of domain definitions, while in the second use case we show the power of visualization for domain understanding by visualizing automatically extracted knowledge in the form of triplets extracted from the karstology domain corpus. The application is entirely web-based without any need for downloading or special configuration. The source code of the web application is also available under the permissive MIT license, allowing future extensions for developing new terminological applications.

pdf bib abs

Mining Semantic Relations from Comparable Corpora through Intersections of Word Embeddings
Špela Vintar | Larisa Grčić Simeunović | Matej Martinc | Senja Pollak | Uroš Stepišnik
Proceedings of the 13th Workshop on Building and Using Comparable Corpora

We report an experiment aimed at extracting words expressing a specific semantic relation using intersections of word embeddings. In a multilingual frame-based domain model, specific features of a concept are typically described through a set of non-arbitrary semantic relations. In karstology, our domain of choice which we are exploring though a comparable corpus in English and Croatian, karst phenomena such as landforms are usually described through their FORM, LOCATION, CAUSE, FUNCTION and COMPOSITION. We propose an approach to mine words pertaining to each of these relations by using a small number of seed adjectives, for which we retrieve closest words using word embeddings and then use intersections of these neighbourhoods to refine our search. Such cross-language expansion of semantically-rich vocabulary is a valuable aid in improving the coverage of a multilingual knowledge base, but also in exploring differences between languages in their respective conceptualisations of the domain.

The paper presents an innovative approach to extract Slovene definition candidates from domain-specific corpora using morphosyntactic patterns, automatic terminology recognition and semantic tagging with wordnet senses. First, a classification model was trained on examples from Slovene Wikipedia which was then used to find well-formed definitions among the extracted candidates. The results of the experiment are encouraging, with accuracy ranging from 67% to 71%. The paper also addresses some drawbacks of the approach and suggests ways to overcome them in future work.

2008

pdf bib abs

Harvesting Multi-Word Expressions from Parallel Corpora
Špela Vintar | Darja Fišer
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The paper presents a set of approaches to extend the automatically created Slovene wordnet with nominal multi-word expressions. In the first approach multi-word expressions from Princeton WordNet are translated with a technique that is based on word-alignment and lexico-syntactic patterns. This is followed by extracting new terms from a monolingual corpus using keywordness ranking and contextual patterns. Finally, the multi-word expressions are assigned a hypernym and added to our wordnet. Manual evaluation and comparison of the results shows that the translation approach is the most straightforward and accurate. However, it is successfully complemented by the two monolingual approaches which are able to identify more term candidates in the corpus that would otherwise go unnoticed. Some weaknesses of the proposed wordnet extension techniques are also addressed.

2004

pdf bib

2002

pdf bib

Evaluation Corpora for Sense Disambiguation in the Medical Domain
Diana Raileanu | Paul Buitelaar | Spela Vintar | Jörg Bay
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib

An Efficient and Flexible Format for Linguistic and Semantic Annotation
Špela Vintar | Paul Buitelaar | Bärbel Ripplinger | Bogdan Sacaleanu | Diana Raileanu | Detlef Prescher
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib

Extracting Terms and Terminological Collocations from the ELAN Slovene–English Parallel Corpus
Špela Vintar
5th EAMT Workshop: Harvesting Existing Resources

Venues

EAMT1

HyTra1

Fix author

Špela Vintar

2024

2020

2019

2017

2012

2011

2010

2008

2004

2002

2000

Co-authors

Venues