Andrejs Vasiļjevs

Also published as: Andrejs Vasiljevs


2021

pdf bib
European Language Grid: A Joint Platform for the European Language Technology Community
Georg Rehm | Stelios Piperidis | Kalina Bontcheva | Jan Hajic | Victoria Arranz | Andrejs Vasiļjevs | Gerhard Backfried | Jose Manuel Gomez-Perez | Ulrich Germann | Rémi Calizzano | Nils Feldhus | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Julian Moreno-Schneider | Dimitris Galanis | Penny Labropoulou | Miltos Deligiannis | Katerina Gkirtzou | Athanasia Kolovou | Dimitris Gkoumas | Leon Voukoutis | Ian Roberts | Jana Hamrlova | Dusan Varis | Lukas Kacena | Khalid Choukri | Valérie Mapelli | Mickaël Rigault | Julija Melnika | Miro Janosik | Katja Prinz | Andres Garcia-Silva | Cristian Berrio | Ondrej Klejch | Steve Renals
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Europe is a multilingual society, in which dozens of languages are spoken. The only option to enable and to benefit from multilingualism is through Language Technologies (LT), i.e., Natural Language Processing and Speech Technologies. We describe the European Language Grid (ELG), which is targeted to evolve into the primary platform and marketplace for LT in Europe by providing one umbrella platform for the European LT landscape, including research and industry, enabling all stakeholders to upload, share and distribute their services, products and resources. At the end of our EU project, which will establish a legal entity in 2022, the ELG will provide access to approx. 1300 services for all European languages as well as thousands of data sets.

2020

pdf bib
The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe
Georg Rehm | Katrin Marheinecke | Stefanie Hegele | Stelios Piperidis | Kalina Bontcheva | Jan Hajič | Khalid Choukri | Andrejs Vasiļjevs | Gerhard Backfried | Christoph Prinz | José Manuel Gómez-Pérez | Luc Meertens | Paul Lukowicz | Josef van Genabith | Andrea Lösch | Philipp Slusallek | Morten Irgens | Patrick Gatellier | Joachim Köhler | Laure Le Bars | Dimitra Anastasiou | Albina Auksoriūtė | Núria Bel | António Branco | Gerhard Budin | Walter Daelemans | Koenraad De Smedt | Radovan Garabík | Maria Gavriilidou | Dagmar Gromann | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Jan Odijk | Maciej Ogrodniczuk | Eiríkur Rögnvaldsson | Mike Rosner | Bolette Pedersen | Inguna Skadiņa | Marko Tadić | Dan Tufiș | Tamás Váradi | Kadri Vider | Andy Way | François Yvon
Proceedings of the 12th Language Resources and Evaluation Conference

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe’s specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI – including many opportunities, synergies but also misconceptions – has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions.

pdf bib
European Language Grid: An Overview
Georg Rehm | Maria Berger | Ela Elsholz | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Stelios Piperidis | Miltos Deligiannis | Dimitris Galanis | Katerina Gkirtzou | Penny Labropoulou | Kalina Bontcheva | David Jones | Ian Roberts | Jan Hajič | Jana Hamrlová | Lukáš Kačena | Khalid Choukri | Victoria Arranz | Andrejs Vasiļjevs | Orians Anvari | Andis Lagzdiņš | Jūlija Meļņika | Gerhard Backfried | Erinç Dikici | Miroslav Janosik | Katja Prinz | Christoph Prinz | Severin Stampler | Dorothea Thomas-Aniola | José Manuel Gómez-Pérez | Andres Garcia Silva | Christian Berrío | Ulrich Germann | Steve Renals | Ondrej Klejch
Proceedings of the 12th Language Resources and Evaluation Conference

With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented – by nation states, languages, verticals and sectors, significantly holding back its impact. The European Language Grid (ELG) project addresses this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG is a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial LTs for all European languages, including running tools and services as well as data sets and resources. Once fully operational, it will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Furthermore, the ELG project organises two open calls for up to 20 pilot projects. It also sets up 32 national competence centres and the European LT Council for outreach and coordination purposes.

pdf bib
The Competitiveness Analysis of the European Language Technology Market
Andrejs Vasiļjevs | Inguna Skadiņa | Indra Samite | Kaspars Kauliņš | Ēriks Ajausks | Jūlija Meļņika | Aivars Bērziņš
Proceedings of the 12th Language Resources and Evaluation Conference

This paper presents the key results of a study on the global competitiveness of the European Language Technology market for three areas – Machine Translation, speech technology, and cross-lingual search. EU competitiveness is analyzed in comparison to North America and Asia. The study focuses on seven dimensions (research, innovations, investments, market dominance, industry, infrastructure, and Open Data) that have been selected to characterize the language technology market. The study concludes that while Europe still has strong positions in Research and Innovation, it lags behind North America and Asia in scaling innovations and conquering market share.

pdf bib
Proceedings of the 1st International Workshop on Language Technology Platforms
Georg Rehm | Kalina Bontcheva | Khalid Choukri | Jan Hajič | Stelios Piperidis | Andrejs Vasiļjevs
Proceedings of the 1st International Workshop on Language Technology Platforms

pdf bib
Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability
Georg Rehm | Dimitris Galanis | Penny Labropoulou | Stelios Piperidis | Martin Welß | Ricardo Usbeck | Joachim Köhler | Miltos Deligiannis | Katerina Gkirtzou | Johannes Fischer | Christian Chiarcos | Nils Feldhus | Julian Moreno-Schneider | Florian Kintzel | Elena Montiel | Víctor Rodríguez Doncel | John Philip McCrae | David Laqua | Irina Patricia Theile | Christian Dittmar | Kalina Bontcheva | Ian Roberts | Andrejs Vasiļjevs | Andis Lagzdiņš
Proceedings of the 1st International Workshop on Language Technology Platforms

With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the approach using the five emerging AI/LT platforms AI4EU, ELG, Lynx, QURATOR and SPEAKER.

pdf bib
A Tale of Eight Countries or the EU Council Presidency Translator in Retrospect
Mārcis Pinnis | Toms Bergmanis | Kristīne Metuzāle | Valters Šics | Artūrs Vasiļevskis | Andrejs Vasiļjevs
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

2019

pdf bib
Competitiveness Analysis of the European Machine Translation Market
Andrejs Vasiļjevs | Inguna Skadiņa | Indra Sāmīte | Kaspars Kauliņš | Ēriks Ajausks | Jūlija Meļņika | Aivars Bērziņš
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

2018

pdf bib
European Language Resource Coordination: Collecting Language Resources for Public Sector Multilingual Information Management
Andrea Lösch | Valérie Mapelli | Stelios Piperidis | Andrejs Vasiļjevs | Lilli Smal | Thierry Declerck | Eileen Schnur | Khalid Choukri | Josef van Genabith
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Tilde MT Platform for Developing Client Specific MT Solutions
Mārcis Pinnis | Andrejs Vasiļjevs | Rihards Kalniņš | Roberts Rozis | Raivis Skadiņš | Valters Šics
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Collecting Language Resources from Public Administrations in the Nordic and Baltic Countries
Andrejs Vasiļjevs | Rihards Kalniņš | Roberts Rozis | Aivars Bērziņš
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Collecting Language Resources for the Latvian e-Government Machine Translation Platform
Roberts Rozis | Andrejs Vasiļjevs | Raivis Skadiņš
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes corpora collection activity for building large machine translation systems for Latvian e-Government platform. We describe requirements for corpora, selection and assessment of data sources, collection of the public corpora and creation of new corpora from miscellaneous sources. Methodology, tools and assessment methods are also presented along with the results achieved, challenges faced and conclusions made. Several approaches to address the data scarceness are discussed. We summarize the volume of obtained corpora and provide quality metrics of MT systems trained on this data. Resulting MT systems for English-Latvian, Latvian English and Latvian Russian are integrated in the Latvian e-service portal and are freely available on website HUGO.LV. This paper can serve as a guidance for similar activities initiated in other countries, particularly in the context of European Language Resource Coordination action.

pdf bib
Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challenges and Opportunities
Georg Rehm | Jan Hajič | Josef van Genabith | Andrejs Vasiljevs
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

META-NET is a European network of excellence, founded in 2010, that consists of 60 research centres in 34 European countries. One of the key visions and goals of META-NET is a truly multilingual Europe, which is substantially supported and realised through language technologies. In this article we provide an overview of recent developments around the multilingual Europe topic, we also describe recent and upcoming events as well as recent and upcoming strategy papers. Furthermore, we provide overviews of two new emerging initiatives, the CEF.AT and ELRC activity on the one hand and the Cracking the Language Barrier federation on the other. The paper closes with several suggested next steps in order to address the current challenges and to open up new opportunities.

2014

pdf bib
How to overtake Google in MT quality - the Baltic case
Andrejs Vasiljevs
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Application of machine translation in localization into low-resourced languages
Raivis Skadiņš | Mārcis Pinnis | Andrejs Vasiļjevs | Inguna Skadiņa | Tomas Hudik
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
Terminology localization guidelines for the national scenario
Juris Borzovs | Ilze Ilziņa | Iveta Keiša | Mārcis Pinnis | Andrejs Vasiļjevs
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a set of principles and practical guidelines for terminology work in the national scenario to ensure a harmonized approach in term localization. These linguistic principles and guidelines are elaborated by the Terminology Commission in Latvia in the domain of Information and Communication Technology (ICT). We also present a novel approach in a corpus-based selection and an evaluation of the most frequently used terms. Analysis of the terms proves that, in general, in the normative terminology work in Latvia localized terms are coined according to these guidelines. We further evaluate how terms included in the database of official terminology are adopted in the general use such as newspaper articles, blogs, forums, websites etc. Our evaluation shows that in a non-normative context the official terminology faces a strong competition from other variations of localized terms. Conclusions and recommendations from lexical analysis of localized terms are provided. We hope that presented guidelines and approach in evaluation will be useful to terminology institutions, regulative authorities and researchers in different countries that are involved in the national terminology work.

pdf bib
The Strategic Impact of META-NET on the Regional, National and International Level
Georg Rehm | Hans Uszkoreit | Sophia Ananiadou | Núria Bel | Audronė Bielevičienė | Lars Borin | António Branco | Gerhard Budin | Nicoletta Calzolari | Walter Daelemans | Radovan Garabík | Marko Grobelnik | Carmen García-Mateo | Josef van Genabith | Jan Hajič | Inma Hernáez | John Judge | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Joseph Mariani | John McNaught | Maite Melero | Monica Monachini | Asunción Moreno | Jan Odijk | Maciej Ogrodniczuk | Piotr Pęzik | Stelios Piperidis | Adam Przepiórkowski | Eiríkur Rögnvaldsson | Michael Rosner | Bolette Pedersen | Inguna Skadiņa | Koenraad De Smedt | Marko Tadić | Paul Thompson | Dan Tufiş | Tamás Váradi | Andrejs Vasiļjevs | Kadri Vider | Jolanta Zabarskaite
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.

pdf bib
Terminology Resources and Terminology Work Benefit from Cloud Services
Tatiana Gornostay | Andrejs Vasiļjevs
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents the concept of the innovative platform TaaS “Terminology as a Service”. TaaS brings the benefits of cloud services to the user, in order to foster the creation of terminology resources and to maintain their up-to-datedness by integrating automated data extraction and user-supported clean-up of raw terminological data and sharing user-validated terminology. The platform is based on cutting-edge technologies, provides single-access-point terminology services, and facilitates the establishment of emerging trends beyond conventional praxis and static models in terminology work. A cloud-based, user-oriented, collaborative, portable, interoperable, and multilingual platform offers such terminology services as terminology project creation and sharing, data collection for translation lookup, user document upload and management, terminology extraction customisation and execution, raw terminological data management, validated terminological data export and reuse, and other terminology services.

bib
Real-world challenges in application of MT for localization: the Baltic case
Mārcis Pinnis | Raivis Skadiņš | Andrejs Vasiļjevs
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Users Track

bib
Machine translation for e-government – the Baltic case
Andrejs Vasiļjevs | Rihards Kalniņš | Mārcis Pinnis | Raivis Skadiņš
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Users Track

2013

pdf bib
Baltic and Nordic Parts of the European Linguistic Infrastructure
Inguna Skadiņa | Andrejs Vasiļjevs | Lars Borin | Krister Lindén | Gyri Losnegaard | Sussi Olsen | Bolette Sandford Pedersen | Roberts Rozis | Koenraad De Smedt
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf bib
Application of Online Terminology Services in Statistical Machine Translation
Raivis Skadins | Marcis Pinnis | Tatiana Gornostay | Andrejs Vasiljevs
Proceedings of Machine Translation Summit XIV: Posters

pdf bib
TaaS: Terminology as a Service
Andrejs Vasiljevs | Tatiana Gornostay
Proceedings of Machine Translation Summit XIV: European projects

2012

pdf bib
Creation of an Open Shared Language Resource Repository in the Nordic and Baltic Countries
Andrejs Vasiļjevs | Markus Forsberg | Tatiana Gornostay | Dorte Haltrup Hansen | Kristín Jóhannsdóttir | Gunn Lyse | Krister Lindén | Lene Offersgaard | Sussi Olsen | Bolette Pedersen | Eiríkur Rögnvaldsson | Inguna Skadiņa | Koenraad De Smedt | Ville Oksanen | Roberts Rozis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The META-NORD project has contributed to an open infrastructure for language resources (data and tools) under the META-NET umbrella. This paper presents the key objectives of META-NORD and reports on the results achieved in the first year of the project. META-NORD has mapped and described the national language technology landscape in the Nordic and Baltic countries in terms of language use, language technology and resources, main actors in the academy, industry, government and society; identified and collected the first batch of language resources in the Nordic and Baltic countries; documented, processed, linked, and upgraded the identified language resources to agreed standards and guidelines. The three horizontal multilingual actions in META-NORD are overviewed in this paper: linking and validating Nordic and Baltic wordnets, the harmonisation of multilingual Nordic and Baltic treebanks, and consolidating multilingual terminology resources across European countries. This paper also touches upon intellectual property rights for the sharing of language resources.

pdf bib
Collecting and Using Comparable Corpora for Statistical Machine Translation
Inguna Skadiņa | Ahmet Aker | Nikos Mastropavlos | Fangzhong Su | Dan Tufis | Mateja Verlic | Andrejs Vasiļjevs | Bogdan Babych | Paul Clough | Robert Gaizauskas | Nikos Glaros | Monica Lestari Paramita | Mārcis Pinnis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Lack of sufficient parallel data for many languages and domains is currently one of the major obstacles to further advancement of automated translation. The ACCURAT project is addressing this issue by researching methods how to improve machine translation systems by using comparable corpora. In this paper we present tools and techniques developed in the ACCURAT project that allow additional data needed for statistical machine translation to be extracted from comparable corpora. We present methods and tools for acquisition of comparable corpora from the Web and other sources, for evaluation of the comparability of collected corpora, for multi-level alignment of comparable corpora and for extraction of lexical and terminological data for machine translation. Finally, we present initial evaluation results on the utility of collected corpora in domain-adapted machine translation and real-life applications.

pdf bib
LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation
Andrejs Vasiļjevs | Raivis Skadiņš | Jörg Tiedemann
Proceedings of the ACL 2012 System Demonstrations

pdf bib
ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora
Mārcis Pinnis | Radu Ion | Dan Ştefănescu | Fangzhong Su | Inguna Skadiņa | Andrejs Vasiļjevs | Bogdan Babych
Proceedings of the ACL 2012 System Demonstrations

2011

pdf bib
LetsMT!: Cloud-Based Platform for Building User Tailored Machine Translation Engines
Andrejs Vasiljevs | Raivis Skadinš | Jörg Tiedemann
Proceedings of Machine Translation Summit XIII: System Presentations

pdf bib
META-NORD: Towards Sharing of Language Resources in Nordic and Baltic Countries
Inguna Skadiņa | Andrejs Vasiļjevs | Lars Borin | Koenraad De Smedt | Krister Lindén | Eiríkur Rögnvaldsson
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm

pdf bib
Evaluation of SMT in localization to under-resourced inflected language
Raivis Skadiņš | Maris Puriņš | Inguna Skadiņa | Andrejs Vasiļjevs
Proceedings of the 15th Annual conference of the European Association for Machine Translation

2010

pdf bib
Corpus Based Analysis for Multilingual Terminology Entry Compounding
Andrejs Vasiljevs | Kaspars Balodis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper proposes statistical analysis methods for improvement of terminology entry compounding. Terminology entry compounding is a mechanism that identifies matching entries across multiple multilingual terminology collections. Bilingual or trilingual term entries are unified in compounded multilingual entry. We suggest that corpus analysis can improve entry compounding results by analysing contextual terms of given term in the corpus data. Proposed algorithm is described. It is implemented in an experimental setup. Results of experiment on compounding of Latvian and Lithuanian terminology resources are provided. These results encourage further research for different language pairs and in different domains.

pdf bib
Bridging the Gap – EuroTermBank Terminology Delivered to Users’ Environment
Tatiana Gornostay | Andrejs Vasiljevs | Signe Rirdance | Roberts Rozis
Proceedings of the 14th Annual conference of the European Association for Machine Translation

2007

pdf bib
Development of Text-To-Speech system for Latvian
Kārlis Goba | Andrejs Vasiļjevs
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

pdf bib
Comprehension Assistant for Languages of Baltic States
Inguna Skadiņa | Andrejs Vasiļjevs | Daiga Deksne | Raivis Skadiņš | Linda Goldberga
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

2006

pdf bib
EuroTermBank - a Terminology Resource based on Best Practice
Lina Henriksen | Claus Povlsen | Andrejs Vasiljevs
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The new EU member countries face the problems of terminology resource fragmentation and lack of coordination in terminology development in general. The EuroTermBank project aims at contributing to improve the terminology infrastructure of the new EU countries and the project will result in a centralized online terminology bank - interlinked to other terminology banks and resources - for languages of the new EU member countries. The main focus of this paper is on a description of how to identify best practice within terminology work seen from a broad perspective. Surveys of real life terminology work have been conducted and these surveys have resulted in identification of scenario specific best practice descriptions of terminology work. Furthermore, this paper will present an outline of the specific criteria that have been used for selection of existing term resources to be included in the EuroTermBank database.
Search
Co-authors
Venues