Julien Perez

2025

Approche générative de la conformation pragmatique : une étude de cas de l’analyse d’une conférence
Julien Perez | Idir Benouaret
Actes de l'atelier Évaluation des modèles génératifs (LLM) et challenge 2025 (EvalLLM)

La relecture en double aveugle est centrale dans les conférences scientifiques, mais des biais persistent. OpenReview a introduit plus de transparence en rendant publics les articles, les évaluations et les décisions. Ce travail explore l’utilisation des grands modèles de langage (LLMs) pour assister différentes étapes du processus de relecture : production de méta-revues, détection de biais et de subjectivité dans les évaluations. L’étude s’appuie sur les données ICLR de 2017 à 2022 et inclut des analyses quantitatives et des évaluations humaines à l’aveugle. Les résultats visent à encourager une relecture scientifique plus efficace et équitable.

pdf bib abs

Évaluation automatique du retour à la source dans un contexte historique long et bruité. Application aux débats parlementaires de la Troisième République française
Julien Perez | Aurélien Pellet | Marie Puren
Actes de l'atelier Évaluation des modèles génératifs (LLM) et challenge 2025 (EvalLLM)

Dans le contexte de l’utilisation croissante des LLM, le besoin d’un retour efficace et automatique aux sources devient essentiel, en particulier pour les documents historiques. La capacité des LLM à identifier les sources pertinentes ne constitue plus seulement un maillon dans une chaîne où l’objectif final est la génération de réponses ; elle représente un enjeu fondamental de l’analyse, justifiant une évaluation à part entière. Quelles stratégies, quels modèles et quels paramètres offrent aux historiens les meilleures capacités d’exploration d’un corpus vaste et bruité ? Cet article propose une première tentative d’évaluation du retriever dans un cadre de RAG appliqué aux débats parlementaires de la Troisième République.

pdf bib abs

Évaluation pédagogique du code à l’aide de grands modèles de langage. Une étude comparative à grande échelle contre les tests unitaires
Julien Perez | Anton Conrad | Elkoussy Laïla
Actes de l'atelier Évaluation des modèles génératifs (LLM) et challenge 2025 (EvalLLM)

L’évaluation automatisée en éducation par projet pour l’apprentissage de la programmation s’appuie traditionnellement sur les tests unitaires pour juger les soumissions de code des étudiants, mettant l’accent sur la correction fonctionnelle. Cependant, ces tests négligent souvent des aspects qualitatifs du code, comme la lisibilité ou la modularité. Cette étude examine le potentiel des grands modèles de langage (LLM) pour évaluer les soumissions de programmation, en comparant leurs résultats à ceux des tests unitaires. À partir d’un grand ensemble de données de rendus d’étudiants à une collection de projets de développement logiciel, nous appliquons des analyses statistiques, modélisations prédictives, ainsi que plusieurs comparaisons pour évaluer l’efficacité des LLMs. Nos résultats mettent en évidence une corrélation significative entre les évaluations des LLMs, pour des prompts donnés, et les tests unitaires. Les modèles prédictifs montrent que les scores des LLMs peuvent être approximés à partir des résultats des tests unitaires, et les classements d’étudiants issus des deux approches sont fortement corrélés. Ces constats restent robustes même en présence de bruit injecté dans les rendus étudiants. Ces résultats suggèrent que les LLM, en capturant des dimensions supplémentaires de la performance, peuvent enrichir les cadres d’évaluation éducative, offrant une approche totale plus nuancée et complète.

2021

pdf bib abs

Globalizing BERT-based Transformer Architectures for Long Document Summarization
Quentin Grail | Julien Perez | Eric Gaussier
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Fine-tuning a large language model on downstream tasks has become a commonly adopted process in the Natural Language Processing (NLP) (CITATION). However, such a process, when associated with the current transformer-based (CITATION) architectures, shows several limitations when the target task requires to reason with long documents. In this work, we introduce a novel hierarchical propagation layer that spreads information between multiple transformer windows. We adopt a hierarchical approach where the input is divided in multiple blocks independently processed by the scaled dot-attentions and combined between the successive layers. We validate the effectiveness of our approach on three extractive summarization corpora of long scientific papers and news articles. We compare our approach to standard and pre-trained language-model-based summarizers and report state-of-the-art results for long document summarization and comparable results for smaller document summarization.

2018

pdf bib

Adversarial Networks for Machine Reading
Quentin Grail | Julien Perez | Tomi Silander
Traitement Automatique des Langues, Volume 59, Numéro 2 : Apprentissage profond pour le traitement automatique des langues [Deep Learning for natural language processing]

2017

pdf bib abs

Dialog state tracking, a machine reading approach using Memory Network
Julien Perez | Fei Liu
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In an end-to-end dialog system, the aim of dialog state tracking is to accurately estimate a compact representation of the current dialog status from a sequence of noisy observations produced by the speech recognition and the natural language understanding modules. This paper introduces a novel method of dialog state tracking based on the general paradigm of machine reading and proposes to solve it using an End-to-End Memory Network, MemN2N, a memory-enhanced neural network architecture. We evaluate the proposed approach on the second Dialog State Tracking Challenge (DSTC-2) dataset. The corpus has been converted for the occasion in order to frame the hidden state variable inference as a question-answering task based on a sequence of utterances extracted from a dialog. We show that the proposed tracker gives encouraging results. Then, we propose to extend the DSTC-2 dataset with specific reasoning capabilities requirement like counting, list maintenance, yes-no question answering and indefinite knowledge management. Finally, we present encouraging results using our proposed MemN2N based tracking model.

pdf bib abs

A Language-independent and Compositional Model for Personality Trait Recognition from Short Texts
Fei Liu | Julien Perez | Scott Nowson
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

There have been many attempts at automatically recognising author personality traits from text, typically incorporating linguistic features with conventional machine learning models, e.g. linear regression or Support Vector Machines. In this work, we propose to use deep-learning-based models with atomic features of text – the characters – to build hierarchical, vectorial word and sentence representations for the task of trait inference. On a corpus of tweets, this method shows state-of-the-art performance across five traits and three languages (English, Spanish and Italian) compared with prior work in author profiling. The results, supported by preliminary visualisation work, are encouraging for the ability to detect complex human traits.

pdf bib abs

Gated End-to-End Memory Networks
Fei Liu | Julien Perez
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Machine reading using differentiable reasoning models has recently shown remarkable progress. In this context, End-to-End trainable Memory Networks (MemN2N) have demonstrated promising performance on simple natural language based reasoning tasks such as factual reasoning and basic deduction. However, other tasks, namely multi-fact question-answering, positional reasoning or dialog related tasks, remain challenging particularly due to the necessity of more complex interactions between the memory and controller modules composing this family of models. In this paper, we introduce a novel end-to-end memory access regulation mechanism inspired by the current progress on the connection short-cutting principle in the field of computer vision. Concretely, we develop a Gated End-to-End trainable Memory Network architecture (GMemN2N). From the machine learning perspective, this new capability is learned in an end-to-end fashion without the use of any additional supervision signal which is, as far as our knowledge goes, the first of its kind. Our experiments show significant improvements on the most challenging tasks in the 20 bAbI dataset, without the use of any domain knowledge. Then, we show improvements on the Dialog bAbI tasks including the real human-bot conversion-based Dialog State Tracking Challenge (DSTC-2) dataset. On these two datasets, our model sets the new state of the art.

2016

pdf bib abs

A Recurrent and Compositional Model for Personality Trait Recognition from Short Texts
Fei Liu | Julien Perez | Scott Nowson
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Many methods have been used to recognise author personality traits from text, typically combining linguistic feature engineering with shallow learning models, e.g. linear regression or Support Vector Machines. This work uses deep-learning-based models and atomic features of text, the characters, to build hierarchical, vectorial word and sentence representations for trait inference. This method, applied to a corpus of tweets, shows state-of-the-art performance across five traits compared with prior work. The results, supported by preliminary visualisation work, are encouraging for the ability to detect complex human traits.

pdf bib

XRCE at SemEval-2016 Task 5: Feedbacked Ensemble Modeling on Syntactico-Semantic Knowledge for Aspect Based Sentiment Analysis
Caroline Brun | Julien Perez | Claude Roux
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

Julien Perez

2025

2021

2018

2017

2016

2015

Co-authors

Venues