Andrea Lösch


2022

We describe the language technology (LT) assessments carried out in the ELRC action (European Language Resource Coordination) of the European Commission, which aims towards minimising language barriers across the EU. We zoom in on the two most extensive assessments. These LT specifications do not only involve experiments with tools and techniques but also an extensive consultation round with stakeholders from public organisations, academia and industry, in order to gather insights into scenarios and best practices. The LT specifications concern (1) the field of automated anonymisation, which is motivated by the need of public and other organisations to be able to store and share data, and (2) the field of multilingual fake news processing, which is motivated by the increasingly pressing problem of disinformation and the limited language coverage of systems for automatically detecting misleading articles. For each specification, we set up a corresponding proof-of-concept software to demonstrate the opportunities and challenges involved in the field.

2020

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe’s specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI – including many opportunities, synergies but also misconceptions – has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions.
Data is key in training modern language technologies. In this paper, we summarise the findings of the first pan-European study on obstacles to sharing language data across 29 EU Member States and CEF-affiliated countries carried out under the ELRC White Paper action on Sustainable Language Data Sharing to Support Language Equality in Multilingual Europe. Why Language Data Matters. We present the methodology of the study, the obstacles identified and report on recommendations on how to overcome those. The obstacles are classified into (1) lack of appreciation of the value of language data, (2) structural challenges, (3) disposition towards CAT tools and lack of digital skills, (4) inadequate language data management practices, (5) limited access to outsourced translations, and (6) legal concerns. Recommendations are grouped into addressing the European/national policy level, and the organisational/institutional level.

2018