Axel-Cyrille Ngonga Ngomo


2024

pdf bib
Error Analysis of Multilingual Language Models in Machine Translation: A Case Study of English-Amharic Translation
Hizkiel Mitiku Alemayehu | Hamada M Zahera | Axel-Cyrille Ngonga Ngomo
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Multilingual large language models (mLLMs) have significantly advanced machine translation, yet challenges remain for low-resource languages like Amharic. This study evaluates the performance of state-of-the-art mLLMs, specifically NLLB-200 (NLLB3.3, NLLB1.3 Distilled1.3, NLB600) and M2M (M2M1.2B, M2M418), in English-Amharic bidirectional translation using the Lesan AI dataset. We employed both automatic and human evaluation methods to analyze translation errors. Automatic evaluation used BLEU, METEOR, chrF, and TER metrics, while human evaluation assessed translation quality at both word and sentence levels. Sentence-level accuracy was rated by annotators on a scale from 0 to 5, and word-level quality was evaluated using Multidimensional Quality Metrics. Our findings indicate that the NLLB3.3B model consistently outperformed other mLLMs across all evaluation methods. Common error included mistranslation, omission, untranslated segments, and additions, with mistranslation being particularly common. Punctuation and spelling errors were rare in our experiment.

pdf bib
Benchmarking Low-Resource Machine Translation Systems
Ana Silva | Nikit Srivastava | Tatiana Moteu Ngoli | Michael Röder | Diego Moussallem | Axel-Cyrille Ngonga Ngomo
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)

Assessing the performance of machine translation systems is of critical value, especially to languages with lower resource availability.Due to the large evaluation effort required by the translation task, studies often compare new systems against single systems or commercial solutions. Consequently, determining the best-performing system for specific languages is often unclear. This work benchmarks publicly available translation systems across 4 datasets and 26 languages, including low-resource languages. We consider both effectiveness and efficiency in our evaluation.Our results are made public through BENG—a FAIR benchmarking platform for Natural Language Generation tasks.

2023

pdf bib
REDFM: a Filtered and Multilingual Relation Extraction Dataset
‪Pere-Lluís Huguet Cabot | Simone Tedeschi | Axel-Cyrille Ngonga Ngomo | Roberto Navigli
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Relation Extraction (RE) is a task that identifies relationships between entities in a text, enabling the acquisition of relational facts and bridging the gap between natural language and structured knowledge. However, current RE models often rely on small datasets with low coverage of relation types, particularly when working with languages other than English.In this paper, we address the above issue and provide two new resources that enable the training and evaluation of multilingual RE systems. First, we present SREDFM, an automatically annotated dataset covering 18 languages, 400 relation types, 13 entity types, totaling more than 40 million triplet instances. Second, we propose REDFM, a smaller, human-revised dataset for seven languages that allows for the evaluation of multilingual RE systems. To demonstrate the utility of these novel datasets, we experiment with the first end-to-end multilingual RE model, mREBEL, that extracts triplets, including entity types, in multiple languages. We release our resources and model checkpoints at [https://www.github.com/babelscape/rebel](https://www.github.com/babelscape/rebel).

2022

pdf bib
CausalQA: A Benchmark for Causal Question Answering
Alexander Bondarenko | Magdalena Wolska | Stefan Heindorf | Lukas Blübaum | Axel-Cyrille Ngonga Ngomo | Benno Stein | Pavel Braslavski | Matthias Hagen | Martin Potthast
Proceedings of the 29th International Conference on Computational Linguistics

At least 5% of questions submitted to search engines ask about cause-effect relationships in some way. To support the development of tailored approaches that can answer such questions, we construct Webis-CausalQA-22, a benchmark corpus of 1.1 million causal questions with answers. We distinguish different types of causal questions using a novel typology derived from a data-driven, manual analysis of questions from ten large question answering (QA) datasets. Using high-precision lexical rules, we extract causal questions of each type from these datasets to create our corpus. As an initial baseline, the state-of-the-art QA model UnifiedQA achieves a ROUGE-L F1 score of 0.48 on our new benchmark.

2020

pdf bib
A General Benchmarking Framework for Text Generation
Diego Moussallem | Paramjot Kaur | Thiago Ferreira | Chris van der Lee | Anastasia Shimorina | Felix Conrads | Michael Röder | René Speck | Claire Gardent | Simon Mille | Nikolai Ilinykh | Axel-Cyrille Ngonga Ngomo
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

The RDF-to-text task has recently gained substantial attention due to the continuous growth of RDF knowledge graphs in number and size. Recent studies have focused on systematically comparing RDF-to-text approaches on benchmarking datasets such as WebNLG. Although some evaluation tools have already been proposed for text generation, none of the existing solutions abides by the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles and involves RDF data for the knowledge extraction task. In this paper, we present BENG, a FAIR benchmarking platform for Natural Language Generation (NLG) and Knowledge Extraction systems with focus on RDF data. BENG builds upon the successful benchmarking platform GERBIL, is opensource and is publicly available along with the data it contains.

2019

pdf bib
A Holistic Natural Language Generation Framework for the Semantic Web
Axel-Cyrille Ngonga Ngomo | Diego Moussallem | Lorenz Bühmann
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

With the ever-growing generation of data for the Semantic Web comes an increasing demand for this data to be made available to non-semantic Web experts. One way of achieving this goal is to translate the languages of the Semantic Web into natural language. We present LD2NL, a framework that allows verbalizing the three key languages of the Semantic Web, i.e., RDF, OWL, and SPARQL. Our framework is based on a bottom-up approach to verbalization. We evaluated LD2NL in an open survey with 86 persons. Our results suggest that our framework can generate verbalizations that are close to natural languages and that can be easily understood by non-experts. Therewith, it enables non-domain experts to interpret Semantic Web data with more than 91% of the accuracy of domain experts.

2018

pdf bib
LIdioms: A Multilingual Linked Idioms Data Set
Diego Moussallem | Mohamed Ahmed Sherif | Diego Esteves | Marcos Zampieri | Axel-Cyrille Ngonga Ngomo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data
Diego Moussallem | Thiago Ferreira | Marcos Zampieri | Maria Claudia Cavalcanti | Geraldo Xexéo | Mariana Neves | Axel-Cyrille Ngonga Ngomo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
BENGAL: An Automatic Benchmark Generator for Entity Recognition and Linking
Axel-Cyrille Ngonga Ngomo | Michael Röder | Diego Moussallem | Ricardo Usbeck | René Speck
Proceedings of the 11th International Conference on Natural Language Generation

The manual creation of gold standards for named entity recognition and entity linking is time- and resource-intensive. Moreover, recent works show that such gold standards contain a large proportion of mistakes in addition to being difficult to maintain. We hence present Bengal, a novel automatic generation of such gold standards as a complement to manually created benchmarks. The main advantage of our benchmarks is that they can be readily generated at any time. They are also cost-effective while being guaranteed to be free of annotation errors. We compare the performance of 11 tools on benchmarks in English generated by Bengal and on 16 benchmarks created manually. We show that our approach can be ported easily across languages by presenting results achieved by 4 tools on both Brazilian Portuguese and Spanish. Overall, our results suggest that our automatic benchmark generation approach can create varied benchmarks that have characteristics similar to those of existing benchmarks. Our approach is open-source. Our experimental results are available at http://faturl.com/bengalexpinlg and the code at https://github.com/dice-group/BENGAL.

2016

pdf bib
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)
Key-Sun Choi | Christina Unger | Piek Vossen | Jin-Dong Kim | Noriko Kando | Axel-Cyrille Ngonga Ngomo
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

2014

pdf bib
A tool suite for creating question answering benchmarks
Axel-Cyrille Ngonga Ngomo | Norman Heino | René Speck | Prodromos Malakasiotis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We introduce the BIOASQ suite, a set of open-source Web tools for the creation, assessment and community-driven improvement of question answering benchmarks. The suite comprises three main tools: (1) the annotation tool supports the creation of benchmarks per se. In particular, this tool allows a team of experts to create questions and answers as well as to annotate the latter with documents, document snippets, RDF triples and ontology concepts. While the creation of questions is supported by different views and contextual information pertaining to the same question, the creation of answers is supported by the integration of several search engines and context information to facilitate the retrieval of the said answers as well as their annotation. (2) The assessment tool allows comparing several answers to the same question. Therewith, it can be used to assess the inter-annotator agreement as well as to manually evaluate automatically generated answers. (3) The third tool in the suite, the social network, aims to ensure the sustainability and iterative improvement of the benchmark by empowering communities of experts to provide insights on the questions in the benchmark. The BIOASQ suite has already been used successfully to create the 311 questions comprised in the BIOASQ question answering benchmark. It has also been evaluated by the experts who used it to create the BIOASQ benchmark.

2012

pdf bib
EAGER: Extending Automatically Gazetteers for Entity Recognition
Omer Farukhan Gunes | Tim Furche | Christian Schallhart | Jens Lehmann | Axel-Cyrille Ngonga Ngomo
Proceedings of the 3rd Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP