Dagmar Gromann


2022

pdf bib
Cross-Lingual Link Discovery for Under-Resourced Languages
Michael Rosner | Sina Ahmadi | Elena-Simona Apostol | Julia Bosque-Gil | Christian Chiarcos | Milan Dojchinovski | Katerina Gkirtzou | Jorge Gracia | Dagmar Gromann | Chaya Liebeskind | Giedrė Valūnaitė Oleškevičienė | Gilles Sérasset | Ciprian-Octavian Truică
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We rst introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We de ne under-resourced languages with a speci c focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources.

pdf bib
Systematic Analysis of Image Schemas in Natural Language through Explainable Multilingual Neural Language Processing
Lennart Wachowiak | Dagmar Gromann
Proceedings of the 29th International Conference on Computational Linguistics

In embodied cognition, physical experiences are believed to shape abstract cognition, such as natural language and reasoning. Image schemas were introduced as spatio-temporal cognitive building blocks that capture these recurring sensorimotor experiences. The few existing approaches for automatic detection of image schemas in natural language rely on specific assumptions about word classes as indicators of spatio-temporal events. Furthermore, the lack of sufficiently large, annotated datasets makes evaluation and supervised learning difficult. We propose to build on the recent success of large multilingual pretrained language models and a small dataset of examples from image schema literature to train a supervised classifier that classifies natural language expressions of varying lengths into image schemas. Despite most of the training data being in English with few examples for German, the model performs best in German. Additionally, we analyse the model’s zero-shot performance in Russian, French, and Mandarin. To further investigate the model’s behaviour, we utilize local linear approximations for prediction probabilities that indicate which words in a sentence the model relies on for its final classification decision. Code and dataset are publicly available.

2021

pdf bib
Transforming Term Extraction: Transformer-Based Approaches to Multilingual Term Extraction Across Domains
Christian Lang | Lennart Wachowiak | Barbara Heinisch | Dagmar Gromann
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Proceedings of the 6th Workshop on Semantic Deep Learning (SemDeep-6)
Luis Espinosa-Anke | Dagmar Gromann | Thierry Declerck | Anna Breit | Jose Camacho-Collados | Mohammad Taher Pilehvar | Artem Revenko
Proceedings of the 6th Workshop on Semantic Deep Learning (SemDeep-6)

2020

pdf bib
CogALex-VI Shared Task: Transrelation - A Robust Multilingual Language Model for Multilingual Relation Identification
Lennart Wachowiak | Christian Lang | Barbara Heinisch | Dagmar Gromann
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon

We describe our submission to the CogALex-VI shared task on the identification of multilingual paradigmatic relations building on XLM-RoBERTa (XLM-R), a robustly optimized and multilingual BERT model. In spite of several experiments with data augmentation, data addition and ensemble methods with a Siamese Triple Net, Translrelation, the XLM-R model with a linear classifier adapted to this specific task, performed best in testing and achieved the best results in the final evaluation of the shared task, even for a previously unseen language.

pdf bib
A Cognitively Motivated Approach to Spatial Information Extraction
Chao Xu | Emmanuelle-Anna Dietz Saldanha | Dagmar Gromann | Beihai Zhou
Proceedings of the Third International Workshop on Spatial Language Understanding

Automatic extraction of spatial information from natural language can boost human-centered applications that rely on spatial dynamics. The field of cognitive linguistics has provided theories and cognitive models to address this task. Yet, existing solutions tend to focus on specific word classes, subject areas, or machine learning techniques that cannot provide cognitively plausible explanations for their decisions. We propose an automated spatial semantic analysis (ASSA) framework building on grammar and cognitive linguistic theories to identify spatial entities and relations, bringing together methods of spatial information extraction and cognitive frameworks on spatial language. The proposed rule-based and explainable approach contributes constructions and preposition schemas and outperforms previous solutions on the CLEF-2017 standard dataset.

pdf bib
The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe
Georg Rehm | Katrin Marheinecke | Stefanie Hegele | Stelios Piperidis | Kalina Bontcheva | Jan Hajič | Khalid Choukri | Andrejs Vasiļjevs | Gerhard Backfried | Christoph Prinz | José Manuel Gómez-Pérez | Luc Meertens | Paul Lukowicz | Josef van Genabith | Andrea Lösch | Philipp Slusallek | Morten Irgens | Patrick Gatellier | Joachim Köhler | Laure Le Bars | Dimitra Anastasiou | Albina Auksoriūtė | Núria Bel | António Branco | Gerhard Budin | Walter Daelemans | Koenraad De Smedt | Radovan Garabík | Maria Gavriilidou | Dagmar Gromann | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Jan Odijk | Maciej Ogrodniczuk | Eiríkur Rögnvaldsson | Mike Rosner | Bolette Pedersen | Inguna Skadiņa | Marko Tadić | Dan Tufiș | Tamás Váradi | Kadri Vider | Andy Way | François Yvon
Proceedings of the Twelfth Language Resources and Evaluation Conference

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe’s specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI – including many opportunities, synergies but also misconceptions – has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions.

2019

pdf bib
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)
Luis Espinosa-Anke | Thierry Declerck | Dagmar Gromann | Jose Camacho-Collados | Mohammad Taher Pilehvar
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)

2018

pdf bib
Proceedings of the Third Workshop on Semantic Deep Learning
Luis Espinosa Anke | Dagmar Gromann | Thierry Declerck
Proceedings of the Third Workshop on Semantic Deep Learning

pdf bib
Comparing Pretrained Multilingual Word Embeddings on an Ontology Alignment Task
Dagmar Gromann | Thierry Declerck
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Hashtag Processing for Enhanced Clustering of Tweets
Dagmar Gromann | Thierry Declerck
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Rich data provided by tweets have beenanalyzed, clustered, and explored in a variety of studies. Typically those studies focus on named entity recognition, entity linking, and entity disambiguation or clustering. Tweets and hashtags are generally analyzed on sentential or word level but not on a compositional level of concatenated words. We propose an approach for a closer analysis of compounds in hashtags, and in the long run also of other types of text sequences in tweets, in order to enhance the clustering of such text documents. Hashtags have been used before as primary topic indicators to cluster tweets, however, their segmentation and its effect on clustering results have not been investigated to the best of our knowledge. Our results with a standard dataset from the Text REtrieval Conference (TREC) show that segmented and harmonized hashtags positively impact effective clustering.

pdf bib
Proceedings of the 2nd Workshop on Semantic Deep Learning (SemDeep-2)
Dagmar Gromann | Thierry Declerck | Georg Heigl
Proceedings of the 2nd Workshop on Semantic Deep Learning (SemDeep-2)