Ilan Kernerman


2026

We release MTQE.en-he: to our knowledge,the first publicly available English-Hebrewbenchmark for Machine Translation QualityEstimation. MTQE.en-he contains 959 English segments from WMT24++, each pairedwith a machine translation into Hebrew, andDirect Assessment scores of the translationquality annotated by three human experts. Webenchmark ChatGPT prompting, TransQuest,and CometKiwi and show that ensemblingthe three models outperforms the best singlemodel (CometKiwi) by 6.4 percentage pointsPearson and 5.8 percentage points Spearman.Fine-tuning experiments with TransQuest andCometKiwi reveal that full-model updates aresensitive to overfitting and distribution collapse,yet parameter-efficient methods (LoRA, BitFit, and FTHead, i.e., fine-tuning only the classification head)train stably and yield improvements of 2-3 percentage points. MTQE.en-heand our experimental results enable future research on this under-resourced language pair.

2025

This paper presents the integration of the Lexicala Latin–French Dictionary into the LiLa Knowledge Base of linguistic resources for Latin made interoperable through their publication as Linked Open Data. The entries of the dictionary are linked to the large collection of Latin lemmas of LiLa (Lemma Bank), enabling interaction with the other resources published therein. The paper details the data modelling process, the linking methodology, and a couple of practical use cases, showing how interlinking resources via LOD can support advancement in (multilingual) linguistic research.

2022

The objective of the Translation Inference Across Dictionaries (TIAD) series of shared tasks is to explore and compare methods and techniques that infer translations indirectly between language pairs, based on other bilingual/multilingual lexicographic resources. In this fifth edition, the participating systems were asked to generate new translations automatically among three languages - English, French, Portuguese - based on known indirect translations contained in the Apertium RDF graph. Such evaluation pairs have been the same during the four last TIAD editions. Since the fourth edition, however, a larger graph is used as a basis to produce the translations, namely Apertium RDF v2. The evaluation of the results was carried out by the organisers against manually compiled language pairs of K Dictionaries. For the second time in the TIAD series, some systems beat the proposed baselines. This paper gives an overall description of the shard task, the evaluation data and methodology, and the systems’ results.

2020

2019

We present a portfolio of natural legal language processing and document curation services currently under development in a collaborative European project. First, we give an overview of the project and the different use cases, while, in the main part of the article, we focus upon the 13 different processing services that are being deployed in different prototype applications using a flexible and scalable microservices architecture. Their orchestration is operationalised using a content and document curation workflow manager.