Ricardo Usbeck


pdf bib
RoMe: A Robust Metric for Evaluating Natural Language Generation
Md Rashad Al Hasan Rony | Liubov Kovriguina | Debanjan Chaudhuri | Ricardo Usbeck | Jens Lehmann
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Evaluating Natural Language Generation (NLG) systems is a challenging task. Firstly, the metric should ensure that the generated hypothesis reflects the reference’s semantics. Secondly, it should consider the grammatical quality of the generated sentence. Thirdly, it should be robust enough to handle various surface forms of the generated sentence. Thus, an effective evaluation metric has to be multifaceted. In this paper, we propose an automatic evaluation metric incorporating several core aspects of natural language understanding (language competence, syntactic and semantic variation). Our proposed metric, RoMe, is trained on language features such as semantic similarity combined with tree edit distance and grammatical acceptability, using a self-supervised neural network to assess the overall quality of the generated sentence. Moreover, we perform an extensive robustness analysis of the state-of-the-art methods and RoMe. Empirical results suggest that RoMe has a stronger correlation to human judgment over state-of-the-art metrics in evaluating system-generated sentences across several NLG tasks.

pdf bib
DialoKG: Knowledge-Structure Aware Task-Oriented Dialogue Generation
Md Rashad Al Hasan Rony | Ricardo Usbeck | Jens Lehmann
Findings of the Association for Computational Linguistics: NAACL 2022

Task-oriented dialogue generation is challenging since the underlying knowledge is often dynamic and effectively incorporating knowledge into the learning process is hard. It is particularly challenging to generate both human-like and informative responses in this setting. Recent research primarily focused on various knowledge distillation methods where the underlying relationship between the facts in a knowledge base is not effectively captured. In this paper, we go one step further and demonstrate how the structural information of a knowledge graph can improve the system’s inference capabilities. Specifically, we propose DialoKG, a novel task-oriented dialogue system that effectively incorporates knowledge into a language model. Our proposed system views relational knowledge as a knowledge graph and introduces (1) a structure-aware knowledge embedding technique, and (2) a knowledge graph-weighted attention masking strategy to facilitate the system selecting relevant information during the dialogue generation. An empirical evaluation demonstrates the effectiveness of DialoKG over state-of-the-art methods on several standard benchmark datasets.


pdf bib
Proxy Indicators for the Quality of Open-domain Dialogues
Rostislav Nedelchev | Jens Lehmann | Ricardo Usbeck
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The automatic evaluation of open-domain dialogues remains a largely unsolved challenge. Despite the abundance of work done in the field, human judges have to evaluate dialogues’ quality. As a consequence, performing such evaluations at scale is usually expensive. This work investigates using a deep-learning model trained on the General Language Understanding Evaluation (GLUE) benchmark to serve as a quality indication of open-domain dialogues. The aim is to use the various GLUE tasks as different perspectives on judging the quality of conversation, thus reducing the need for additional training data or responses that serve as quality references. Due to this nature, the method can infer various quality metrics and can derive a component-based overall score. We achieve statistically significant correlation coefficients of up to 0.7.


pdf bib
Message Passing for Hyper-Relational Knowledge Graphs
Mikhail Galkin | Priyansh Trivedi | Gaurav Maheshwari | Ricardo Usbeck | Jens Lehmann
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Hyper-relational knowledge graphs (KGs) (e.g., Wikidata) enable associating additional key-value pairs along with the main triple to disambiguate, or restrict the validity of a fact. In this work, we propose a message passing based graph encoder - StarE capable of modeling such hyper-relational KGs. Unlike existing approaches, StarE can encode an arbitrary number of additional information (qualifiers) along with the main triple while keeping the semantic roles of qualifiers and triples intact. We also demonstrate that existing benchmarks for evaluating link prediction (LP) performance on hyper-relational KGs suffer from fundamental flaws and thus develop a new Wikidata-based dataset - WD50K. Our experiments demonstrate that StarE based LP model outperforms existing approaches across multiple benchmarks. We also confirm that leveraging qualifiers is vital for link prediction with gains up to 25 MRR points compared to triple-based representations.

pdf bib
Language Model Transformers as Evaluators for Open-domain Dialogues
Rostislav Nedelchev | Jens Lehmann | Ricardo Usbeck
Proceedings of the 28th International Conference on Computational Linguistics

Computer-based systems for communication with humans are a cornerstone of AI research since the 1950s. So far, the most effective way to assess the quality of the dialogues produced by these systems is to use resource-intensive manual labor instead of automated means. In this work, we investigate whether language models (LM) based on transformer neural networks can indicate the quality of a conversation. In a general sense, language models are methods that learn to predict one or more words based on an already given context. Due to their unsupervised nature, they are candidates for efficient, automatic indication of dialogue quality. We demonstrate that human evaluators have a positive correlation between the output of the language models and scores. We also provide some insights into their behavior and inner-working in a conversational context.

pdf bib
Treating Dialogue Quality Evaluation as an Anomaly Detection Problem
Rostislav Nedelchev | Ricardo Usbeck | Jens Lehmann
Proceedings of the 12th Language Resources and Evaluation Conference

Dialogue systems for interaction with humans have been enjoying increased popularity in the research and industry fields. To this day, the best way to estimate their success is through means of human evaluation and not automated approaches, despite the abundance of work done in the field. In this paper, we investigate the effectiveness of perceiving dialogue evaluation as an anomaly detection task. The paper looks into four dialogue modeling approaches and how their objective functions correlate with human annotation scores. A high-level perspective exhibits negative results. However, a more in-depth look shows some potential for using anomaly detection for evaluating dialogues.

pdf bib
Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability
Georg Rehm | Dimitris Galanis | Penny Labropoulou | Stelios Piperidis | Martin Welß | Ricardo Usbeck | Joachim Köhler | Miltos Deligiannis | Katerina Gkirtzou | Johannes Fischer | Christian Chiarcos | Nils Feldhus | Julian Moreno-Schneider | Florian Kintzel | Elena Montiel | Víctor Rodríguez Doncel | John Philip McCrae | David Laqua | Irina Patricia Theile | Christian Dittmar | Kalina Bontcheva | Ian Roberts | Andrejs Vasiļjevs | Andis Lagzdiņš
Proceedings of the 1st International Workshop on Language Technology Platforms

With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the approach using the five emerging AI/LT platforms AI4EU, ELG, Lynx, QURATOR and SPEAKER.


pdf bib
BENGAL: An Automatic Benchmark Generator for Entity Recognition and Linking
Axel-Cyrille Ngonga Ngomo | Michael Röder | Diego Moussallem | Ricardo Usbeck | René Speck
Proceedings of the 11th International Conference on Natural Language Generation

The manual creation of gold standards for named entity recognition and entity linking is time- and resource-intensive. Moreover, recent works show that such gold standards contain a large proportion of mistakes in addition to being difficult to maintain. We hence present Bengal, a novel automatic generation of such gold standards as a complement to manually created benchmarks. The main advantage of our benchmarks is that they can be readily generated at any time. They are also cost-effective while being guaranteed to be free of annotation errors. We compare the performance of 11 tools on benchmarks in English generated by Bengal and on 16 benchmarks created manually. We show that our approach can be ported easily across languages by presenting results achieved by 4 tools on both Brazilian Portuguese and Spanish. Overall, our results suggest that our automatic benchmark generation approach can create varied benchmarks that have characteristics similar to those of existing benchmarks. Our approach is open-source. Our experimental results are available at http://faturl.com/bengalexpinlg and the code at https://github.com/dice-group/BENGAL.


pdf bib
NIF4OGGD - NLP Interchange Format for Open German Governmental Data
Mohamed Sherif | Sandro Coelho | Ricardo Usbeck | Sebastian Hellmann | Jens Lehmann | Martin Brümmer | Andreas Both
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In the last couple of years the amount of structured open government data has increased significantly. Already now, citizens are able to leverage the advantages of open data through increased transparency and better opportunities to take part in governmental decision making processes. Our approach increases the interoperability of existing but distributed open governmental datasets by converting them to the RDF-based NLP Interchange Format (NIF). Furthermore, we integrate the converted data into a geodata store and present a user interface for querying this data via a keyword-based search. The language resource generated in this project is publicly available for download and also via a dedicated SPARQL endpoint.

pdf bib
N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format
Michael Röder | Ricardo Usbeck | Sebastian Hellmann | Daniel Gerber | Andreas Both
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Extracting Linked Data following the Semantic Web principle from unstructured sources has become a key challenge for scientific research. Named Entity Recognition and Disambiguation are two basic operations in this extraction process. One step towards the realization of the Semantic Web vision and the development of highly accurate tools is the availability of data for validating the quality of processes for Named Entity Recognition and Disambiguation as well as for algorithm tuning. This article presents three novel, manually curated and annotated corpora (N3). All of them are based on a free license and stored in the NLP Interchange Format to leverage the Linked Data character of our datasets.