Voula Giouli

Also published as: V. Giouli


2024

pdf bib
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Archna Bhatia | Gosse Bouma | A. Seza Doğruöz | Kilian Evang | Marcos Garcia | Voula Giouli | Lifeng Han | Joakim Nivre | Alexandre Rademaker
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

pdf bib
Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy, and the Lexicon-Corpus Interface
Verginica Barbu Mititelu | Voula Giouli | Kilian Evang | Daniel Zeman | Petya Osenova | Carole Tiberius | Simon Krek | Stella Markantonatou | Ivelina Stoyanova | Ranka Stanković | Christian Chiarcos
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

We present ongoing work towards defining a lexicon-corpus interface to serve as a benchmark in the representation of multiword expressions (of various parts of speech) in dedicated lexica and the linking of these entries to their corpus occurrences. The final aim is the harnessing of such resources for the automatic identification of multiword expressions in a text. The involvement of several natural languages aims at the universality of a solution not centered on a particular language, and also accommodating idiosyncrasies. Challenges in the lexicographic description of multiword expressions are discussed, the current status of lexica dedicated to this linguistic phenomenon is outlined, as well as the solution we envisage for creating an ecosystem of interlinked lexica and corpora containing and, respectively, annotated with multiword expressions.

pdf bib
UniDive: A COST Action on Universality, Diversity and Idiosyncrasy in Language Technology
Agata Savary | Daniel Zeman | Verginica Barbu Mititelu | Anabela Barreiro | Olesea Caftanatov | Marie-Catherine de Marneffe | Kaja Dobrovoljc | Gülşen Eryiğit | Voula Giouli | Bruno Guillaume | Stella Markantonatou | Nurit Melnik | Joakim Nivre | Atul Kr. Ojha | Carlos Ramisch | Abigail Walsh | Beata Wójtowicz | Alina Wróblewska
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024

This paper presents the objectives, organization and activities of the UniDive COST Action, a scientific network dedicated to universality, diversity and idiosyncrasy in language technology. We describe the objectives and organization of this initiative, the people involved, the working groups and the ongoing tasks and activities. This paper is also an pen call for participation towards new members and countries.

2023

pdf bib
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
Archna Bhatia | Kilian Evang | Marcos Garcia | Voula Giouli | Lifeng Han | Shiva Taslimipoor
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

pdf bib
PARSEME corpus release 1.3
Agata Savary | Cherifa Ben Khelil | Carlos Ramisch | Voula Giouli | Verginica Barbu Mititelu | Najet Hadj Mohamed | Cvetana Krstev | Chaya Liebeskind | Hongzhi Xu | Sara Stymne | Tunga Güngör | Thomas Pickard | Bruno Guillaume | Eduard Bejček | Archna Bhatia | Marie Candito | Polona Gantar | Uxoa Iñurrieta | Albert Gatt | Jolanta Kovalevskaite | Timm Lichte | Nikola Ljubešić | Johanna Monti | Carla Parra Escartín | Mehrnoush Shamsfard | Ivelina Stoyanova | Veronika Vincze | Abigail Walsh
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced.

2022

pdf bib
Placing multi-modal, and multi-lingual Data in the Humanities Domain on the Map: the Mythotopia Geo-tagged Corpus
Voula Giouli | Anna Vacalopoulou | Nikolaos Sidiropoulos | Christina Flouda | Athanasios Doupas | Giorgos Giannopoulos | Nikos Bikakis | Vassilis Kaffes | Gregory Stainhaouer
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The paper gives an account of an infrastructure that will be integrated into a platform aimed at providing a multi-faceted experience to visitors of Northern Greece using mythology as a starting point. This infrastructure comprises a multi-lingual and multi-modal corpus (i.e., a corpus of textual data supplemented with images, and video) that belongs to the humanities domain along with a dedicated database (content management system) with advanced indexing, linking and search functionalities. We will present the corpus itself focusing on the content, the methodology adopted for its development, and the steps taken towards rendering it accessible via the database in a way that also facilitates useful visualizations. In this context, we tried to address three main challenges: (a) to add a novel annotation layer, namely geotagging, (b) to ensure the long-term maintenance of and accessibility to the highly heterogeneous primary data – even after the life cycle of the current project – by adopting a metadata schema that is compatible to existing standards; and (c) to render the corpus a useful resource to scholarly research in the digital humanities by adding a minimum set of linguistic annotations.

2020

pdf bib
Greek within the Global FrameNet Initiative: Challenges and Conclusions so far
Voula Giouli | Vera Pilitsidou | Hephaestion Christopoulos
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet

Large coverage lexical resources that bear deep linguistic information have always been considered useful for many natural language processing (NLP) applications including Machine Translation (MT). In this respect, Frame-based resources have been developed for many languages following Frame Semantics and the Berkeley FrameNet project. However, to a great extent, all those efforts have been kept fragmented. Consequentially, the Global FrameNet initiative has been conceived of as a joint effort to bring together FrameNets in different languages. The proposed paper is aimed at describing ongoing work towards developing the Greek (EL) counterpart of the Global FrameNet and our efforts to contribute to the Shared Annotation Task. In the paper, we will elaborate on the annotation methodology employed, the current status and progress made so far, as well as the problems raised during annotation.

pdf bib
Edition 1.2 of the PARSEME Shared Task on Semi-supervised Identification of Verbal Multiword Expressions
Carlos Ramisch | Agata Savary | Bruno Guillaume | Jakub Waszczuk | Marie Candito | Ashwini Vaidya | Verginica Barbu Mititelu | Archna Bhatia | Uxoa Iñurrieta | Voula Giouli | Tunga Güngör | Menghan Jiang | Timm Lichte | Chaya Liebeskind | Johanna Monti | Renata Ramisch | Sara Stymne | Abigail Walsh | Hongzhi Xu
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

We present edition 1.2 of the PARSEME shared task on identification of verbal multiword expressions (VMWEs). Lessons learned from previous editions indicate that VMWEs have low ambiguity, and that the major challenge lies in identifying test instances never seen in the training data. Therefore, this edition focuses on unseen VMWEs. We have split annotated corpora so that the test corpora contain around 300 unseen VMWEs, and we provide non-annotated raw corpora to be used by complementary discovery methods. We released annotated and raw corpora in 14 languages, and this semi-supervised challenge attracted 7 teams who submitted 9 system results. This paper describes the effort of corpus creation, the task design, and the results obtained by the participating systems, especially their performance on unseen expressions.

2018

pdf bib
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Carlos Ramisch | Silvio Ricardo Cordeiro | Agata Savary | Veronika Vincze | Verginica Barbu Mititelu | Archna Bhatia | Maja Buljan | Marie Candito | Polona Gantar | Voula Giouli | Tunga Güngör | Abdelati Hawwari | Uxoa Iñurrieta | Jolanta Kovalevskaitė | Simon Krek | Timm Lichte | Chaya Liebeskind | Johanna Monti | Carla Parra Escartín | Behrang QasemiZadeh | Renata Ramisch | Nathan Schneider | Ivelina Stoyanova | Ashwini Vaidya | Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.

2017

pdf bib
The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Agata Savary | Carlos Ramisch | Silvio Cordeiro | Federico Sangati | Veronika Vincze | Behrang QasemiZadeh | Marie Candito | Fabienne Cap | Voula Giouli | Ivelina Stoyanova | Antoine Doucet
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

Multiword expressions (MWEs) are known as a “pain in the neck” for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one’s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as “words with spaces”. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.

2014

pdf bib
Encoding MWEs in a conceptual lexicon
Aggeliki Fotopoulou | Stella Markantonatou | Voula Giouli
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

pdf bib
Linguistically motivated Language Resources for Sentiment Analysis
Voula Giouli | Aggeliki Fotopoulou
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

2009

pdf bib
A Web-Enabled and Speech-Enhanced Parallel Corpus of Greek-Bulgarian Cultural Texts
Voula Giouli | Nikos Glaros | Kiril Simov | Petya Osenova
Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCH – SHELT&R 2009)

2008

pdf bib
Building a Greek corpus for Textual Entailment
Evi Marzelou | Maria Zourari | Voula Giouli | Stelios Piperidis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textual Entailment Corpus (GTEC) that is appropriate for guiding training and evaluation of a system that recognizes Textual Entailment in Greek texts. The corpus of textual units was collected in view of a range of NLP applications, where semantic interpretation is of paramount importance, and it was manually annotated at the level of Textual Entailment. Moreover, a number of linguistic annotations were also integrated that were deemed useful for prospect system developers. The critical issue was the development of a final resource that is re-usable and adaptable to different NLP systems, in order to either enhance their accuracy or to evaluate their output. We are hereby focusing on the methodological issues underpinning data selection and annotation. An initial approach towards the development of a system catering for the automatic Recognition of Textual Entailment in Greek is also presented and preliminary results are reported.

2006

pdf bib
Language Resources Production Models: the Case of the INTERA Multilingual Corpus and Terminology
Maria Gavrilidou | Penny Labropoulou | Stelios Piperidis | Voula Giouli | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Khalid Choukri
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper reports on the multilingual Language Resources (MLRs), i.e. parallel corpora and terminological lexicons for less widely digitally available languages, that have been developed in the INTERA project and the methodology adopted for their production. Special emphasis is given to the reality factors that have influenced the MLRs development approach and their final constitution. Building on the experience gained in the project, a production model has been elaborated, suggesting ways and techniques that can be exploited in order to improve LRs production taking into account realistic issues.

pdf bib
Multi-domain Multi-lingual Named Entity Recognition: Revisiting & Grounding the resources issue
Voula Giouli | Alexis Konstandinidis | Elina Desypri | Harris Papageorgiou
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The paper reports on the development methodology of a system aimed at multi-domain multi-lingual recognition and classification of names in texts, the focus being on the linguistic resources used for training and testing purposes. The corpus presented here has been collected and annotated in the framework of different projects the critical issue being the development of a final resource that is homogenous, re-usable and adaptable to different domains and languages with a view to robust multi-domain and multi-lingual NERC.

2004

pdf bib
Building Parallel Corpora for eContent Professionals
M. Gavrilidou | P. Labropoulou | E. Desipri | V. Giouli | V. Antonopoulos | S. Piperidis
Proceedings of the Workshop on Multilingual Linguistic Resources

2002

pdf bib
Multi-level XML-based Corpus Annotation
Harris Papageorgiou | Prokopis Prokopidis | Voula Giouli | Iason Demiros | Alexis Konstantinidis | Stelios Piperidis
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Named Entity Recognition in Greek Texts
Iason Demiros | Sotiris Boutsis | Voula Giouli | Maria Liakata | Harris Papageorgiou | Stelios Piperidis
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
A Robust Parser for Unrestricted Greek Text
Sotiris Boutsis | Prokopis Prokopidis | Voula Giouli | Stelios Piperidis
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
A Unified POS Tagging Architecture and its Application to Greek
Harris Papageorgiou | Prokopis Prokopidis | Voula Giouli | Stelios Piperidis
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

Search
Co-authors
Venues