Frédéric Herledan

Also published as: Frederic Herledan

2025

Question-parsing with Abstract Meaning Representation enhanced by adding small datasets
Johannes Heinecke | Maria Boritchev | Frédéric Herledan
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Abstract Meaning Representation (AMR) is a graph-based formalism for representing meaning in sentences. As the annotation is quite complex, few annotated corpora exist. The most well-known and widely-used corpora are LDC’s AMR 3.0 and the datasets available on the new AMR website. Models trained on the LDC corpora work fine on texts with similar genre and style: sentences extracted from news articles, Wikipedia articles. However, other types of texts, in particular questions, are less well processed by models trained on this data. We analyse how adding few sentence-type specific annotations can steer the model to improve parsing in the case of questions in English.

2022

pdf bib abs

Knowledge Extraction From Texts Based on Wikidata
Anastasia Shimorina | Johannes Heinecke | Frédéric Herledan
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track

This paper presents an effort within our company of developing knowledge extraction pipeline for English, which can be further used for constructing an entreprise-specific knowledge base. We present a system consisting of entity detection and linking, coreference resolution, and relation extraction based on the Wikidata schema. We highlight existing challenges of knowledge extraction by evaluating the deployed pipeline on real-world data. We also make available a database, which can serve as a new resource for sentential relation extraction, and we underline the importance of having balanced data for training classification models.

2021

pdf bib

Génération automatique de texte en langage naturel pour les systèmes de questions-réponses [Compact answer generation with a Transformer based approach]
Imen Akermi | Johannes Heinecke | Frédéric Herledan
Traitement Automatique des Langues, Volume 62, Numéro 1 : Varia [Varia]

2020

pdf bib abs

Transformer based Natural Language Generation for Question-Answering
Imen Akermi | Johannes Heinecke | Frédéric Herledan
Proceedings of the 13th International Conference on Natural Language Generation

This paper explores Natural Language Generation within the context of Question-Answering task. The several works addressing this task only focused on generating a short answer or a long text span that contains the answer, while reasoning over a Web page or processing structured data. Such answers’ length are usually not appropriate as the answer tend to be perceived as too brief or too long to be read out loud by an intelligent assistant. In this work, we aim at generating a concise answer for a given question using an unsupervised approach that does not require annotated data. Tested over English and French datasets, the proposed approach shows very promising results.

pdf bib abs

Approche de génération de réponse à base de transformers (Transformer based approach for answer generation)
Imen Akermi | Johannes Heinecke | Frédéric Herledan
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

Cet article présente une approche non-supervisée basée sur les modèles Transformer pour la génération du langage naturel dans le cadre des systèmes de question-réponse. Cette approche permettrait de remédier à la problématique de génération de réponse trop courte ou trop longue sans avoir recours à des données annotées. Cette approche montre des résultats prometteurs pour l’anglais et le français.

2019

pdf bib abs

CALOR-QUEST : un corpus d’entraînement et d’évaluation pour la compréhension automatique de textes (Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document)
Frederic Bechet | Cindy Aloui | Delphine Charlet | Geraldine Damnati | Johannes Heinecke | Alexis Nasr | Frederic Herledan
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

La compréhension automatique de texte est une tâche faisant partie de la famille des systèmes de Question/Réponse où les questions ne sont pas à portée générale mais sont liées à un document particulier. Récemment de très grand corpus (SQuAD, MS MARCO) contenant des triplets (document, question, réponse) ont été mis à la disposition de la communauté scientifique afin de développer des méthodes supervisées à base de réseaux de neurones profonds en obtenant des résultats prometteurs. Ces méthodes sont cependant très gourmandes en données d’apprentissage, données qui n’existent pour le moment que pour la langue anglaise. Le but de cette étude est de permettre le développement de telles ressources pour d’autres langues à moindre coût en proposant une méthode générant de manière semi-automatique des questions à partir d’une analyse sémantique d’un grand corpus. La collecte de questions naturelle est réduite à un ensemble de validation/test. L’application de cette méthode sur le corpus CALOR-Frame a permis de développer la ressource CALOR-QUEST présentée dans cet article.

pdf bib abs

CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations
Frederic Bechet | Cindy Aloui | Delphine Charlet | Geraldine Damnati | Johannes Heinecke | Alexis Nasr | Frederic Herledan
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English and Chinese at the moment. The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora. The collect of natural questions is reduced to a validation/test set. We applied this method on the CALOR-Frame French corpus to develop the CALOR-QUEST resource presented in this paper.

pdf bib abs

We present a spoken conversational question answering proof of concept that is able to answer questions about general knowledge from Wikidata. The dialogue agent does not only orchestrate various agents but also solve coreferences and ellipsis.