Stelios Piperidis

Also published as: S. Piperidis, Stelios Piperdis


2024

pdf bib
Investigating Political Ideologies through the Greek ParlaMint corpus
Maria Gavriilidou | Dimitris Gkoumas | Stelios Piperidis | Prokopis Prokopidis
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024

This paper has two objectives: to present (a) the creation of ParlaMint-GR, the Greek part of the ParlaMint corpora of debates in the parliaments of Europe, and (b) preliminary results on its comparison with a corpus of Greek party manifestos, aiming at the investigation of the ideologies of the Greek political parties and members of the Parliament. Additionally, a gender related comparison is explored. The creation of the ParlaMint-GR corpus is discussed, together with the solutions adopted for various challenges faced. The corpus of party manifestos, available through CLARIN:EL, serves for a comparative study with the corpus of speeches delivered by the members of the Greek Parliament, with the aim to identify the ideological positions of parties and politicians.

pdf bib
Enhancing Scientific Discourse: Machine Translation for the Scientific Domain
Dimitris Roussis | Sokratis Sofianopoulos | Stelios Piperidis
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

The increasing volume of scientific research necessitates effective communication across language barriers. Machine translation (MT) offers a promising solution for accessing international publications. However, the scientific domain presents unique challenges due to its specialized vocabulary and complex sentence structures. In this paper, we present the development of a collection of parallel and monolingual corpora from the scientific domain. The corpora target the language pairs Spanish-English, French-English, and Portuguese-English. For each language pair, we create a large general scientific corpus as well as four smaller corpora focused on the research domains of: Energy Research, Neuroscience, Cancer and Transportation. To evaluate the quality of these corpora, we utilize them for fine-tuning general-purpose neural machine translation (NMT) systems. We provide details regarding the corpus creation process, the fine-tuning strategies employed, and we conclude with the evaluation results.

pdf bib
Proceedings of the Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability @ LREC-COLING 2024
Federico Gaspari | Joss Moorkens | Itziar Aldabe | Aritz Farwell | Begona Altuna | Stelios Piperidis | Georg Rehm | German Rigau
Proceedings of the Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability @ LREC-COLING 2024

pdf bib
Common European Language Data Space
Georg Rehm | Stelios Piperidis | Khalid Choukri | Andrejs Vasiļjevs | Katrin Marheinecke | Victoria Arranz | Aivars Bērziņš | Miltos Deligiannis | Dimitris Galanis | Maria Giagkou | Katerina Gkirtzou | Dimitris Gkoumas | Annika Grützner-Zahn | Athanasia Kolovou | Penny Labropoulou | Andis Lagzdiņš | Elena Leitner | Valérie Mapelli | Hélène Mazo | Simon Ostermann | Stefania Racioppa | Mickaël Rigault | Leon Voukoutis
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The Common European Language Data Space (LDS) is an integral part of the EU data strategy, which aims at developing a single market for data. Its decentralised technical infrastructure and governance scheme are currently being developed by the LDS project, which also has dedicated tasks for proof-of-concept prototypes, handling legal aspects, raising awareness and promoting the LDS through events and social media channels. The LDS is part of a broader vision for establishing all necessary components to develop European large language models.

pdf bib
European Language Grid: One Year after
Georg Rehm | Stelios Piperidis | Dimitris Galanis | Penny Labropoulou | Maria Giagkou | Miltos Deligiannis | Leon Voukoutis | Martin Courtois | Julian Moreno-Schneider | Katrin Marheinecke
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The European Language Grid (ELG) is a cloud platform for the whole European Language Technology community. While the EU project that developed the platform successfully concluded in June 2022, the ELG initiative has continued. This article provides a description of the current state of ELG in terms of user adoption and number of language resources and technologies available in early 2024. It also provides an overview of the various activities with regard to ELG since the end of the project and since the publication of the ELG book, especially the co-authors’ attempt to integrate the ELG platform into various data space initiatives. The article also provides an overview of the Digital Language Equality (DLE) dashboard and the current state of DLE in Europe.

2022

pdf bib
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Nicoletta Calzolari | Frédéric Béchet | Philippe Blache | Khalid Choukri | Christopher Cieri | Thierry Declerck | Sara Goggi | Hitoshi Isahara | Bente Maegaard | Joseph Mariani | Hélène Mazo | Jan Odijk | Stelios Piperidis
Proceedings of the Thirteenth Language Resources and Evaluation Conference

pdf bib
Constructing Parallel Corpora from COVID-19 News using MediSys Metadata
Dimitrios Roussis | Vassilis Papavassiliou | Sokratis Sofianopoulos | Prokopis Prokopidis | Stelios Piperidis
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents a collection of parallel corpora generated by exploiting the COVID-19 related dataset of metadata created with the Europe Media Monitor (EMM) / Medical Information System (MediSys) processing chain of news articles. We describe how we constructed comparable monolingual corpora of news articles related to the current pandemic and used them to mine about 11.2 million segment alignments in 26 EN-X language pairs, covering most official EU languages plus Albanian, Arabic, Icelandic, Macedonian, and Norwegian. Subsets of this collection have been used in shared tasks (e.g. Multilingual Semantic Search, Machine Translation) aimed at accelerating the creation of resources and tools needed to facilitate access to information in the COVID-19 emergency situation.

pdf bib
SciPar: A Collection of Parallel Corpora from Scientific Abstracts
Dimitrios Roussis | Vassilis Papavassiliou | Prokopis Prokopidis | Stelios Piperidis | Vassilis Katsouros
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents SciPar, a new collection of parallel corpora created from openly available metadata of bachelor theses, master theses and doctoral dissertations hosted in institutional repositories, digital libraries of universities and national archives. We describe first how we harvested and processed metadata from 86, mainly European, repositories to extract bilingual titles and abstracts, and then how we mined high quality sentence pairs in a wide range of scientific areas and sub-disciplines. In total, the resource includes 9.17 million segment alignments in 31 language pairs and is publicly available via the ELRC-SHARE repository. The bilingual corpora in this collection could prove valuable in various applications, such as cross-lingual plagiarism detection or adapting Machine Translation systems for the translation of scientific texts and academic writing in general, especially for language pairs which include English.

pdf bib
Overview of the ELE Project
Itziar Aldabe | Jane Dunne | Aritz Farwell | Owen Gallagher | Federico Gaspari | Maria Giagkou | Jan Hajic | Jens Peter Kückens | Teresa Lynn | Georg Rehm | German Rigau | Katrin Marheinecke | Stelios Piperidis | Natalia Resende | Tea Vojtěchová | Andy Way
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This paper provides an overview of the ongoing European Language Equality(ELE) project, an 18-month action funded by the European Commission which involves 52 partners. The primary goal of ELE is to prepare the European Language Equality Programme, in the form of a strategic research, innovation and implementation agenda and a roadmap for achieving full digital language equality (DLE) in Europe by 2030.

pdf bib
Categorizing legal features in a metadata-oriented task: defining the conditions of use
Mickaël Rigault | Victoria Arranz | Valérie Mapelli | Penny Labropoulou | Stelios Piperidis
Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference

pdf bib
Introducing the Digital Language Equality Metric: Technological Factors
Federico Gaspari | Owen Gallagher | Georg Rehm | Maria Giagkou | Stelios Piperidis | Jane Dunne | Andy Way
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference

This paper introduces the concept of Digital Language Equality (DLE) developed by the EU-funded European Language Equality (ELE) project, and describes the associated DLE Metric with a focus on its technological factors (TFs), which are complemented by situational contextual factors. This work aims at objectively describing the level of technological support of all European languages and lays the foundation to implement a large-scale EU-wide programme to ensure that these languages can continue to exist and prosper in the digital age, to serve the present and future needs of their speakers. The paper situates this ongoing work with a strong European focus in the broader context of related efforts, and explains how the DLE Metric can help track the progress towards DLE for all languages of Europe, focusing in particular on the role played by the TFs. These are derived from the European Language Grid (ELG) Catalogue, that provides the empirical basis to measure the level of digital readiness of all European languages. The DLE Metric scores can be consulted through an online interactive dashboard to show the level of technological support of each European language and track the overall progress toward DLE.

pdf bib
Collaborative Metadata Aggregation and Curation in Support of Digital Language Equality Monitoring
Maria Giagkou | Stelios Piperidis | Penny Labropoulou | Miltos Deligiannis | Athanasia Kolovou | Leon Voukoutis
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference

The European Language Equality (ELE) project develops a strategic research, innovation and implementation agenda (SRIA) and a roadmap for achieving full digital language equality in Europe by 2030. Key component of the SRIA development is an accurate estimation of the current standing of languages with respect to their technological readiness. In this paper we present the empirical basis on which such estimation is grounded, its starting point and in particular the automatic and collaborative methods used for extending it. We focus on the collaborative expert activities, the challenges posed, and the solutions adopted. We also briefly present the dashboard application developed for querying and visualising the empirical data as well as monitoring and comparing the evolution of technological support within and across languages.

2021

pdf bib
European Language Grid: A Joint Platform for the European Language Technology Community
Georg Rehm | Stelios Piperidis | Kalina Bontcheva | Jan Hajic | Victoria Arranz | Andrejs Vasiļjevs | Gerhard Backfried | Jose Manuel Gomez-Perez | Ulrich Germann | Rémi Calizzano | Nils Feldhus | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Julian Moreno-Schneider | Dimitris Galanis | Penny Labropoulou | Miltos Deligiannis | Katerina Gkirtzou | Athanasia Kolovou | Dimitris Gkoumas | Leon Voukoutis | Ian Roberts | Jana Hamrlova | Dusan Varis | Lukas Kacena | Khalid Choukri | Valérie Mapelli | Mickaël Rigault | Julija Melnika | Miro Janosik | Katja Prinz | Andres Garcia-Silva | Cristian Berrio | Ondrej Klejch | Steve Renals
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Europe is a multilingual society, in which dozens of languages are spoken. The only option to enable and to benefit from multilingualism is through Language Technologies (LT), i.e., Natural Language Processing and Speech Technologies. We describe the European Language Grid (ELG), which is targeted to evolve into the primary platform and marketplace for LT in Europe by providing one umbrella platform for the European LT landscape, including research and industry, enabling all stakeholders to upload, share and distribute their services, products and resources. At the end of our EU project, which will establish a legal entity in 2022, the ELG will provide access to approx. 1300 services for all European languages as well as thousands of data sets.

2020

pdf bib
Proceedings of the Twelfth Language Resources and Evaluation Conference
Nicoletta Calzolari | Frédéric Béchet | Philippe Blache | Khalid Choukri | Christopher Cieri | Thierry Declerck | Sara Goggi | Hitoshi Isahara | Bente Maegaard | Joseph Mariani | Hélène Mazo | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the Twelfth Language Resources and Evaluation Conference

pdf bib
The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe
Georg Rehm | Katrin Marheinecke | Stefanie Hegele | Stelios Piperidis | Kalina Bontcheva | Jan Hajič | Khalid Choukri | Andrejs Vasiļjevs | Gerhard Backfried | Christoph Prinz | José Manuel Gómez-Pérez | Luc Meertens | Paul Lukowicz | Josef van Genabith | Andrea Lösch | Philipp Slusallek | Morten Irgens | Patrick Gatellier | Joachim Köhler | Laure Le Bars | Dimitra Anastasiou | Albina Auksoriūtė | Núria Bel | António Branco | Gerhard Budin | Walter Daelemans | Koenraad De Smedt | Radovan Garabík | Maria Gavriilidou | Dagmar Gromann | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Jan Odijk | Maciej Ogrodniczuk | Eiríkur Rögnvaldsson | Mike Rosner | Bolette Pedersen | Inguna Skadiņa | Marko Tadić | Dan Tufiș | Tamás Váradi | Kadri Vider | Andy Way | François Yvon
Proceedings of the Twelfth Language Resources and Evaluation Conference

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe’s specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI – including many opportunities, synergies but also misconceptions – has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions.

pdf bib
European Language Grid: An Overview
Georg Rehm | Maria Berger | Ela Elsholz | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Stelios Piperidis | Miltos Deligiannis | Dimitris Galanis | Katerina Gkirtzou | Penny Labropoulou | Kalina Bontcheva | David Jones | Ian Roberts | Jan Hajič | Jana Hamrlová | Lukáš Kačena | Khalid Choukri | Victoria Arranz | Andrejs Vasiļjevs | Orians Anvari | Andis Lagzdiņš | Jūlija Meļņika | Gerhard Backfried | Erinç Dikici | Miroslav Janosik | Katja Prinz | Christoph Prinz | Severin Stampler | Dorothea Thomas-Aniola | José Manuel Gómez-Pérez | Andres Garcia Silva | Christian Berrío | Ulrich Germann | Steve Renals | Ondrej Klejch
Proceedings of the Twelfth Language Resources and Evaluation Conference

With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented – by nation states, languages, verticals and sectors, significantly holding back its impact. The European Language Grid (ELG) project addresses this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG is a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial LTs for all European languages, including running tools and services as well as data sets and resources. Once fully operational, it will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Furthermore, the ELG project organises two open calls for up to 20 pilot projects. It also sets up 32 national competence centres and the European LT Council for outreach and coordination purposes.

pdf bib
Making Metadata Fit for Next Generation Language Technology Platforms: The Metadata Schema of the European Language Grid
Penny Labropoulou | Katerina Gkirtzou | Maria Gavriilidou | Miltos Deligiannis | Dimitris Galanis | Stelios Piperidis | Georg Rehm | Maria Berger | Valérie Mapelli | Michael Rigault | Victoria Arranz | Khalid Choukri | Gerhard Backfried | José Manuel Gómez-Pérez | Andres Garcia-Silva
Proceedings of the Twelfth Language Resources and Evaluation Conference

The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the description of Language Resources and Technologies (processing and generation services and tools, models, corpora, term lists, etc.), as well as related entities (e.g., organizations, projects, supporting documents, etc.). The schema powers the European Language Grid platform that aims to be the primary hub and marketplace for industry-relevant Language Technology in Europe. ELG-SHARE has been based on various metadata schemas, vocabularies, and ontologies, as well as related recommendations and guidelines.

pdf bib
Verbal Aggression as an Indicator of Xenophobic Attitudes in Greek Twitter during and after the Financial Crisis
Maria Pontiki | Maria Gavriilidou | Dimitris Gkoumas | Stelios Piperidis
Proceedings of the Workshop about Language Resources for the SSH Cloud

We present a replication of a data-driven and linguistically inspired Verbal Aggression analysis framework that was designed to examine Twitter verbal attacks against predefined target groups of interest as an indicator of xenophobic attitudes during the financial crisis in Greece, in particular during the period 2013-2016. The research goal in this paper is to re-examine Verbal Aggression as an indicator of xenophobic attitudes in Greek Twitter three years later, in order to trace possible changes regarding the main targets, the types and the content of the verbal attacks against the same targets in the post crisis era, given also the ongoing refugee crisis and the political landscape in Greece as it was shaped after the elections in 2019. The results indicate an interesting rearrangement of the main targets of the verbal attacks, while the content and the types of the attacks provide valuable insights about the way these targets are being framed as compared to the respective dominant perceptions and stereotypes about them during the period 2013-2016.

pdf bib
Proceedings of the 1st International Workshop on Language Technology Platforms
Georg Rehm | Kalina Bontcheva | Khalid Choukri | Jan Hajič | Stelios Piperidis | Andrejs Vasiļjevs
Proceedings of the 1st International Workshop on Language Technology Platforms

pdf bib
CLARIN: Distributed Language Resources and Technology in a European Infrastructure
Maria Eskevich | Franciska de Jong | Alexander König | Darja Fišer | Dieter Van Uytvanck | Tero Aalto | Lars Borin | Olga Gerassimenko | Jan Hajic | Henk van den Heuvel | Neeme Kahusk | Krista Liin | Martin Matthiesen | Stelios Piperidis | Kadri Vider
Proceedings of the 1st International Workshop on Language Technology Platforms

CLARIN is a European Research Infrastructure providing access to digital language resources and tools from across Europe and beyond to researchers in the humanities and social sciences. This paper focuses on CLARIN as a platform for the sharing of language resources. It zooms in on the service offer for the aggregation of language repositories and the value proposition for a number of communities that benefit from the enhanced visibility of their data and services as a result of integration in CLARIN. The enhanced findability of language resources is serving the social sciences and humanities (SSH) community at large and supports research communities that aim to collaborate based on virtual collections for a specific domain. The paper also addresses the wider landscape of service platforms based on language technologies which has the potential of becoming a powerful set of interoperable facilities to a variety of communities of use.

pdf bib
Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability
Georg Rehm | Dimitris Galanis | Penny Labropoulou | Stelios Piperidis | Martin Welß | Ricardo Usbeck | Joachim Köhler | Miltos Deligiannis | Katerina Gkirtzou | Johannes Fischer | Christian Chiarcos | Nils Feldhus | Julian Moreno-Schneider | Florian Kintzel | Elena Montiel | Víctor Rodríguez Doncel | John Philip McCrae | David Laqua | Irina Patricia Theile | Christian Dittmar | Kalina Bontcheva | Ian Roberts | Andrejs Vasiļjevs | Andis Lagzdiņš
Proceedings of the 1st International Workshop on Language Technology Platforms

With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the approach using the five emerging AI/LT platforms AI4EU, ELG, Lynx, QURATOR and SPEAKER.

2018

bib
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Nicoletta Calzolari | Khalid Choukri | Christopher Cieri | Thierry Declerck | Sara Goggi | Koiti Hasida | Hitoshi Isahara | Bente Maegaard | Joseph Mariani | Hélène Mazo | Asuncion Moreno | Jan Odijk | Stelios Piperidis | Takenobu Tokunaga
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Managing Public Sector Data for Multilingual Applications Development
Stelios Piperidis | Penny Labropoulou | Miltos Deligiannis | Maria Giagkou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
European Language Resource Coordination: Collecting Language Resources for Public Sector Multilingual Information Management
Andrea Lösch | Valérie Mapelli | Stelios Piperidis | Andrejs Vasiļjevs | Lilli Smal | Thierry Declerck | Eileen Schnur | Khalid Choukri | Josef van Genabith
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Discovering Parallel Language Resources for Training MT Engines
Vassilis Papavassiliou | Prokopis Prokopidis | Stelios Piperidis
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
The ILSP/ARC submission to the WMT 2018 Parallel Corpus Filtering Shared Task
Vassilis Papavassiliou | Sokratis Sofianopoulos | Prokopis Prokopidis | Stelios Piperidis
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the submission of the Institute for Language and Speech Processing/Athena Research and Innovation Center (ILSP/ARC) for the WMT 2018 Parallel Corpus Filtering shared task. We explore several properties of sentences and sentence pairs that our system explored in the context of the task with the purpose of clustering sentence pairs according to their appropriateness in training MT systems. We also discuss alternative methods for ranking the sentence pairs of the most appropriate clusters with the aim of generating the two datasets (of 10 and 100 million words as required in the task) that were evaluated. By summarizing the results of several experiments that were carried out by the organizers during the evaluation phase, our submission achieved an average BLEU score of 26.41, even though it does not make use of any language-specific resources like bilingual lexica, monolingual corpora, or MT output, while the average score of the best participant system was 27.91.

2016

pdf bib
The ILSP/ARC submission to the WMT 2016 Bilingual Document Alignment Shared Task
Vassilis Papavassiliou | Prokopis Prokopidis | Stelios Piperidis
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

bib
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Nicoletta Calzolari | Khalid Choukri | Thierry Declerck | Sara Goggi | Marko Grobelnik | Bente Maegaard | Joseph Mariani | Helene Mazo | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

pdf bib
Parallel Global Voices: a Collection of Multilingual Corpora with Citizen Media Stories
Prokopis Prokopidis | Vassilis Papavassiliou | Stelios Piperidis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a new collection of multilingual corpora automatically created from the content available in the Global Voices websites, where volunteers have been posting and translating citizen media stories since 2004. We describe how we crawled and processed this content to generate parallel resources comprising 302.6K document pairs and 8.36M segment alignments in 756 language pairs. For some language pairs, the segment alignments in this resource are the first open examples of their kind. In an initial use of this resource, we discuss how a set of document pair detection algorithms performs on the Greek-English corpus.

2015

pdf bib
A Data Sharing and Annotation Service Infrastructure
Stelios Piperidis | Dimitrios Galanis | Juli Bakagianni | Sokratis Sofianopoulos
Proceedings of ACL-IJCNLP 2015 System Demonstrations

2014

bib
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Nicoletta Calzolari | Khalid Choukri | Thierry Declerck | Hrafn Loftsson | Bente Maegaard | Joseph Mariani | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
The Strategic Impact of META-NET on the Regional, National and International Level
Georg Rehm | Hans Uszkoreit | Sophia Ananiadou | Núria Bel | Audronė Bielevičienė | Lars Borin | António Branco | Gerhard Budin | Nicoletta Calzolari | Walter Daelemans | Radovan Garabík | Marko Grobelnik | Carmen García-Mateo | Josef van Genabith | Jan Hajič | Inma Hernáez | John Judge | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Joseph Mariani | John McNaught | Maite Melero | Monica Monachini | Asunción Moreno | Jan Odijk | Maciej Ogrodniczuk | Piotr Pęzik | Stelios Piperidis | Adam Przepiórkowski | Eiríkur Rögnvaldsson | Michael Rosner | Bolette Pedersen | Inguna Skadiņa | Koenraad De Smedt | Marko Tadić | Paul Thompson | Dan Tufiş | Tamás Váradi | Andrejs Vasiļjevs | Kadri Vider | Jolanta Zabarskaite
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.

pdf bib
META-SHARE: One year after
Stelios Piperidis | Harris Papageorgiou | Christian Spurk | Georg Rehm | Khalid Choukri | Olivier Hamon | Nicoletta Calzolari | Riccardo del Gratta | Bernardo Magnini | Christian Girardi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents META-SHARE (www.meta-share.eu), an open language resource infrastructure, and its usage since its Europe-wide deployment in early 2013. META-SHARE is a network of repositories that store language resources (data, tools and processing services) documented with high-quality metadata, aggregated in central inventories allowing for uniform search and access. META-SHARE was developed by META-NET (www.meta-net.eu) and aims to serve as an important component of a language technology marketplace for researchers, developers, professionals and industrial players, catering for the full development cycle of language technology, from research through to innovative products and services. The observed usage in its initial steps, the steadily increasing number of network nodes, resources, users, queries, views and downloads are all encouraging and considered as supportive of the choices made so far. In tandem, take-up activities like direct linking and processing of datasets by language processing services as well as metadata transformation to RDF are expected to open new avenues for data and resources linking and boost the organic growth of the infrastructure while facilitating language technology deployment by much wider research communities and industrial sectors.

2013

pdf bib
QTLaunchpad
Stephen Doherty | Declan Groves | Josef van Genabith | Arle Lommel | Aljoscha Burchardt | Hans Uszkoreit | Lucia Specia | Stelios Piperidis
Proceedings of Machine Translation Summit XIV: European projects

2012

bib
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Nicoletta Calzolari | Khalid Choukri | Thierry Declerck | Mehmet Uğur Doğan | Bente Maegaard | Joseph Mariani | Asuncion Moreno | Jan Odijk | Stelios Piperidis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

pdf bib
The FLaReNet Strategic Language Resource Agenda
Claudia Soria | Núria Bel | Khalid Choukri | Joseph Mariani | Monica Monachini | Jan Odijk | Stelios Piperidis | Valeria Quochi | Nicoletta Calzolari
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The FLaReNet Strategic Agenda highlights the most pressing needs for the sector of Language Resources and Technologies and presents a set of recommendations for its development and progress in Europe, as issued from a three-year consultation of the FLaReNet European project. The FLaReNet recommendations are organised around nine dimensions: a) documentation b) interoperability c) availability, sharing and distribution d) coverage, quality and adequacy e) sustainability f) recognition g) development h) infrastructure and i) international cooperation. As such, they cover a broad range of topics and activities, spanning over production and use of language resources, licensing, maintenance and preservation issues, infrastructures for language resources, resource identification and sharing, evaluation and validation, interoperability and policy issues. The intended recipients belong to a large set of players and stakeholders in Language Resources and Technology, ranging from individuals to research and education institutions, to policy-makers, funding agencies, SMEs and large companies, service and media providers. The main goal of these recommendations is to serve as an instrument to support stakeholders in planning for and addressing the urgencies of the Language Resources and Technologies of the future.

pdf bib
The META-SHARE Metadata Schema for the Description of Language Resources
Maria Gavrilidou | Penny Labropoulou | Elina Desipri | Stelios Piperidis | Haris Papageorgiou | Monica Monachini | Francesca Frontini | Thierry Declerck | Gil Francopoulo | Victoria Arranz | Valerie Mapelli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a metadata model for the description of language resources proposed in the framework of the META-SHARE infrastructure, aiming to cover both datasets and tools/technologies used for their processing. It places the model in the overall framework of metadata models, describes the basic principles and features of the model, elaborates on the distinction between minimal and maximal versions thereof, briefly presents the integrated environment supporting the LRs description and search and retrieval processes and concludes with work to be done in the future for the improvement of the model.

pdf bib
The META-SHARE Language Resources Sharing Infrastructure: Principles, Challenges, Solutions
Stelios Piperidis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Language resources have become a key factor in the development cycle of language technology. The current prevailing methodologies, the sheer number of languages and the vast volumes of digital content together with the wide palette of useful content processing applications, render new models for managing the underlying language resources indispensable. This paper presents META-SHARE, an open resource exchange infrastructure, which aims to boost visibility, documentation, identification, openness and sharing, collaboration, preservation and interoperability of language data and basic language processing tools. META-SHARE is implemented as a network of distributed repositories of language resources. It offers providers and consumers of resources the necessary functionalities for describing, storing, searching, licensing and downloading language resources in a single integrated technical platform. META-SHARE favours and aligns itself with the growing open data and open source tools movement. To this end, it has prepared the necessary underlying legal framework consisting of a Charter for language resource sharing, as well as a set of licensing templates aiming to act as recommended licence models in an attempt to facilitate the legal interoperability of language resources. In its current version, META-SHARE features 13 resource repositories, with over 1200 resource packages.

2011

pdf bib
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm
Nicoletta Calzolari | Toru Ishida | Stelios Piperidis | Virach Sornlertlamvanich
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm

pdf bib
A Metadata Schema for the Description of Language Resources (LRs)
Maria Gavrilidou | Penny Labropoulou | Stelios Piperidis | Monica Monachini | Francesca Frontini | Gil Francopoulo | Victoria Arranz | Valérie Mapelli
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm

pdf bib
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage
Cristina Vertan | Milena Slavcheva | Petya Osenova | Stelios Piperidis
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage

2010

bib
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Nicoletta Calzolari | Khalid Choukri | Bente Maegaard | Joseph Mariani | Jan Odijk | Stelios Piperidis | Mike Rosner | Daniel Tapias
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf bib
The LREC Map of Language Resources and Technologies
Nicoletta Calzolari | Claudia Soria | Riccardo Del Gratta | Sara Goggi | Valeria Quochi | Irene Russo | Khalid Choukri | Joseph Mariani | Stelios Piperidis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we present the LREC Map of Language Resources and Tools, an innovative feature introduced with this LREC. The purpose of the Map is to shed light on the vast amount of resources and tools that represent the background of the research presented at LREC, in the attempt to fill in a gap in the community knowledge about the resources and tools that are used or created worldwide. It also aims at a change of culture in the field, actively engaging each researcher in the documentation task about resources. The Map has been developed on the basis of the information provided by LREC authors during the submission of papers to the LREC 2010 conference and the LREC workshops, and contains information about almost 2000 resources. The paper illustrates the motivation behind this initiative, its main characteristics, its relevance and future impact in the field, the metadata used to describe the resources, and finally presents some of the most relevant findings.

pdf bib
Resource and Service Centres as the Backbone for a Sustainable Service Infrastructure
Peter Wittenburg | Nuria Bel | Lars Borin | Gerhard Budin | Nicoletta Calzolari | Eva Hajicova | Kimmo Koskenniemi | Lothar Lemnitzer | Bente Maegaard | Maciej Piasecki | Jean-Marie Pierrel | Stelios Piperidis | Inguna Skadina | Dan Tufis | Remco van Veenendaal | Tamas Váradi | Martin Wynne
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Currently, research infrastructures are being designed and established in many disciplines since they all suffer from an enormous fragmentation of their resources and tools. In the domain of language resources and tools the CLARIN initiative has been funded since 2008 to overcome many of the integration and interoperability hurdles. CLARIN can build on knowledge and work from many projects that were carried out during the last years and wants to build stable and robust services that can be used by researchers. Here service centres will play an important role that have the potential of being persistent and that adhere to criteria as they have been established by CLARIN. In the last year of the so-called preparatory phase these centres are currently developing four use cases that can demonstrate how the various pillars CLARIN has been working on can be integrated. All four use cases fulfil the criteria of being cross-national.

2009

pdf bib
Machine Translation and its Philosophical Accounts
Stelios Piperidis
Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous?

pdf bib
Proceedings of the Workshop Multilingual resources, technologies and evaluation for central and Eastern European languages
Elena Paskaleva | Stelios Piperidis | Milena Slavcheva | Cristina Vertan
Proceedings of the Workshop Multilingual resources, technologies and evaluation for central and Eastern European languages

2008

bib
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Nicoletta Calzolari | Khalid Choukri | Bente Maegaard | Joseph Mariani | Jan Odijk | Stelios Piperidis | Daniel Tapias
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

pdf bib
Condensing Sentences for Subtitle Generation
Prokopis Prokopidis | Vassia Karra | Aggeliki Papagianopoulou | Stelios Piperidis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Text condensation aims at shortening the length of an utterance without losing essential textual information. In this paper, we report on the implementation and preliminary evaluation of a sentence condensation tool for Greek using a manually constructed table of 450 lexical paraphrases, and a set of rules that delete syntactic subtrees that carry minor semantic information. Evaluation on two-sentence sets show promising results regarding grammaticality and semantic acceptability of compressed versions.

pdf bib
Foundation of a Component-based Flexible Registry for Language Resources and Technology
Daan Broeder | Thierry Declerck | Erhard Hinrichs | Stelios Piperidis | Laurent Romary | Nicoletta Calzolari | Peter Wittenburg
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Within the CLARIN e-science infrastructure project it is foreseen to develop a component-based registry for metadata for Language Resources and Language Technology. With this registry it is hoped to overcome the problems of the current available systems with respect to inflexible fixed schema, unsuitable terminology and interoperability problems. The registry will address interoperability needs by refering to a shared vocabulary registered in data category registries as they are suggested by ISO.

pdf bib
Building a Greek corpus for Textual Entailment
Evi Marzelou | Maria Zourari | Voula Giouli | Stelios Piperidis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textual Entailment Corpus (GTEC) that is appropriate for guiding training and evaluation of a system that recognizes Textual Entailment in Greek texts. The corpus of textual units was collected in view of a range of NLP applications, where semantic interpretation is of paramount importance, and it was manually annotated at the level of Textual Entailment. Moreover, a number of linguistic annotations were also integrated that were deemed useful for prospect system developers. The critical issue was the development of a final resource that is re-usable and adaptable to different NLP systems, in order to either enhance their accuracy or to evaluate their output. We are hereby focusing on the methodological issues underpinning data selection and annotation. An initial approach towards the development of a system catering for the automatic Recognition of Textual Entailment in Greek is also presented and preliminary results are reported.

2006

pdf bib
Language Resources Production Models: the Case of the INTERA Multilingual Corpus and Terminology
Maria Gavrilidou | Penny Labropoulou | Stelios Piperidis | Voula Giouli | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Khalid Choukri
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper reports on the multilingual Language Resources (MLRs), i.e. parallel corpora and terminological lexicons for less widely digitally available languages, that have been developed in the INTERA project and the methodology adopted for their production. Special emphasis is given to the reality factors that have influenced the MLRs development approach and their final constitution. Building on the experience gained in the project, a production model has been elaborated, suggesting ways and techniques that can be exploited in order to improve LRs production taking into account realistic issues.

2004

pdf bib
Building Parallel Corpora for eContent Professionals
M. Gavrilidou | P. Labropoulou | E. Desipri | V. Giouli | V. Antonopoulos | S. Piperidis
Proceedings of the Workshop on Multilingual Linguistic Resources

pdf bib
ENABLER Thematic Network of National Projects: Technical, Strategic and Political Issues of LRs
Nicoletta Calzolari | Khalid Choukri | Maria Gavrilidou | Bente Maegaard | Paola Baroni | Hanne Fersøe | Alessandro Lenci | Valérie Mapelli | Monica Monachini | Stelios Piperidis
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Multimodal, Multilingual Resources in the Subtitling Process
Stelios Piperidis | Iason Demiros | Prokopis Prokopidis | Peter Vanroose | Anja Hoethker | Walter Daelemans | Elsa Sklavounou | Manos Konstantinou | Yannis Karavidas
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
Multi-level XML-based Corpus Annotation
Harris Papageorgiou | Prokopis Prokopidis | Voula Giouli | Iason Demiros | Alexis Konstantinidis | Stelios Piperidis
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Towards memory and template-based translation synthesis
Christos Malavazos | Stelios Piperidis | George Carayannis
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000

pdf bib
An alignment architecture for translation memory bootstrapping
Ioannis Triantafyllou | Iason Demiros | Christos Malavazos | Stelios Piperidis
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000

bib
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)
M. Gavrilidou | G. Carayannis | S. Markantonatou | S. Piperidis | G. Stainhauer
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Term-based Identification of Sentences for Text Summarisation
Byron Georgantopoulos | Stelios Piperidis
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Named Entity Recognition in Greek Texts
Iason Demiros | Sotiris Boutsis | Voula Giouli | Maria Liakata | Harris Papageorgiou | Stelios Piperidis
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
A Robust Parser for Unrestricted Greek Text
Sotiris Boutsis | Prokopis Prokopidis | Voula Giouli | Stelios Piperidis
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
A Unified POS Tagging Architecture and its Application to Greek
Harris Papageorgiou | Prokopis Prokopidis | Voula Giouli | Stelios Piperidis
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Design and Implementation of the Online ILSP Greek Corpus
Nick Hatzigeorgiu | Maria Gavrilidou | Stelios Piperidis | George Carayannis | Anastasia Papakostopoulou | Athanassia Spiliotopoulou | Anna Vacalopoulou | Penny Labropoulou | Elena Mantzari | Harris Papageorgiou | Iason Demiros
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Application of Analogical Modelling to Example Based Machine Translation
Christos Malavazosi | Stelios Piperidis
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

1999

pdf bib
A Multi-level Framework for Memory-Based Translation Aid Tools
Stelios Piperidis | Christos Malavazos | Ioannis Triantafyllou
Proceedings of Translating and the Computer 21

1998

pdf bib
Aligning Clattses in Parallel Texts
Sotiris Boutsis | Stelios Piperidis
Proceedings of the Third Conference on Empirical Methods for Natural Language Processing

1994

pdf bib
A Matching Technique in Example-Based Machine Translation
Lambros Cranias | Harris Papageorgiou | Stelios Piperdis
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

pdf bib
Interactive Corpus-based Translation Learning Tool (Translearn)
Stelios Piperidis
Proceedings of Translating and the Computer 16

pdf bib
Automatic Alignment in Parallel Corpora
Harris Papageorgiou | Lambros Cranias | Stelios Piperidis
32nd Annual Meeting of the Association for Computational Linguistics

Search
Co-authors