Thierry Declerck

2023

In the last five years, there has been a significant focus in Natural Language Processing (NLP) on developing larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension. These PLMs have achieved impressive results on these benchmarks, even surpassing human performance in some cases. This has led to claims of superhuman capabilities and the provocative idea that certain tasks have been solved. In this position paper, we take a critical look at these claims and ask whether PLMs truly have superhuman abilities and what the current benchmarks are really evaluating. We show that these benchmarks have serious limitations affecting the comparison between humans and PLMs and provide recommendations for fairer and more transparent benchmarks.

pdf bib abs

This paper reports on the shared tasks organized by the 20th IWSLT Conference. The shared tasks address 9 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, multilingual, dialect and low-resource speech translation, and formality control. The shared tasks attracted a total of 38 submissions by 31 teams. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.

pdf bib abs

Linked Open Data compliant Representation of the Interlinking of Nordic Wordnets and Sign Language Data
Thierry Declerck | Sussi Olsen
Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)

We present ongoing work dealing with a Linked Open Data (LOD) compliant representation of Sign Language (SL) data, with the goal of supporting the cross-lingual linking of SL data, also to Spoken Language data. As the European EASIER research project has already investigated the use of Open Multilingual Wordnet (OMW) datasets for cross-linking German and Greek SL data, we propose a unified RDF-based representation of OMW and SL data. In this context, we experimented with the transformation into RDF of a rich dataset, which links Danish Sign Language data and the wordnet for Danish, DanNet. We extend this work to other Nordic languages, aiming at supporting cross-lingual comparisons of Nordic Sign Languages. This unified formal representation offers a semantic repository of information on SL data that could be accessed for supporting the creation of datasets for training or evaluating NLP applications that involve SLs.

pdf bib abs

Towards an RDF Representation of the Infrastructure consisting in using Wordnets as a conceptual Interlingua between multilingual Sign Language Datasets
Thierry Declerck | Thomas Troelsgård | Sussi Olsen
Proceedings of the 12th Global Wordnet Conference

We present ongoing work dealing with a Linked Data compliant representation of infrastructures using wordnets for connecting multilingual Sign Language data sets. We build for this on already existing RDF and OntoLex representations of Open Multilingual Wordnet (OMW) data sets and work done by the European EASIER research project on the use of the CSV files of OMW for linking glosses and basic semantic information associated with Sign Language data sets in two languages: German and Greek. In this context, we started the transformation into RDF of a Danish data set, which links Danish Sign Language data and the wordnet for Danish, DanNet. The final objective of our work is to include Sign Language data sets (and their conceptual cross-linking via wordnets) in the Linguistic Linked Open Data cloud.

pdf bib abs

Enriching Multiword Terms in Wiktionary with Pronunciation Information
Lenka Bajcetic | Thierry Declerck | Gilles Sérasset
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

We report on work in progress dealing with the automated generation of pronunciation information for English multiword terms (MWTs) in Wiktionary, combining information available for their single components. We describe the issues we were encountering, the building of an evaluation dataset, and our teaming with the DBnary resource maintainer. Our approach shows potential for automatically adding morphosyntactic and semantic information to the components of such MWTs.

pdf bib abs

For Sign Languages (SLs), can we create a SignNet, like a WordNet for spoken languages: a network of semantic relations between constitutive elements of SLs? We first discuss approaches that link SL data to wordnets, or integrate such elements with some adaptations into the structure of WordNet. Then, we present requirements for a SignNet, which is built on SL data and then linked to WordNet.

pdf bib abs

We present work dealing with a Linked Open Data (LOD)-compliant representation of Sign Language (SL) data, with the goal of supporting the cross-lingual alignment of SL data and their linking to Spoken Language (SpL) data. The proposed representation is based on activities of groups of researchers in the field of SL who have investigated the use of Open Multilingual Wordnet (OMW) datasets for (manually) cross-linking SL data or for linking SL and SpL data. Another group of researchers is proposing an XML encoding of articulatory elements of SLs and (manually) linking those to an SpL lexical resource. We propose an RDF-based representation of those various data. This unified formal representation offers a semantic repository of information on SL and SpL data that could be accessed for supporting the creation of datasets for training or evaluating NLP applications dealing with SLs, thinking for example of Machine Translation (MT) between SLs and between SLs and SpLs.

Thierry Declerck

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2010

2009

2008

2006

2004

2003

2002

2001

2000

1998

1997

1996

Co-authors

Venues