Gaspard Michel

2025

Evaluating LLMs for Quotation Attribution in Literary Texts: A Case Study of LLaMa3
Gaspard Michel | Elena V. Epure | Romain Hennequin | Christophe Cerisara
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

Large Language Models (LLMs) have shown promising results in a variety of literary tasks, often using complex memorized details of narration and fictional characters. In this work, we evaluate the ability of Llama-3 at attributing utterances of direct-speech to their speaker in novels. The LLM shows impressive results on a corpus of 28 novels, surpassing published results with ChatGPT and encoder-based baselines by a large margin. We then validate these results by assessing the impact of book memorization and annotation contamination.We found that these types of memorization do not explain the large performance gain, making Llama-3 the new state-of-the-art for quotation attribution in English literature. We release publicly our code and data.

pdf bib abs

Évaluation des LLMs pour l’Attribution de Citations dans les Textes Littéraires: une Étude de LLaMa3
Gaspard Michel | Elena V. Epure | Romain Hennequin | Christophe Cerisara
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 2 : traductions d'articles publiés

Les grands modèles de langage (LLMs) ont montré des résultats prometteurs dans diverses tâches littéraires, souvent liés la mémorisation de détails complexes sur la narration et les personnages fictifs. Dans cet article, nous évaluons la capacité de Llama-3 à attribuer les citations à leur locuteur dans les romans Anglais du 18ème au 20ème siècle. Le LLM obtient des résultats impressionnants sur un corpus de 28 romans, surpassant largement les performances publiées de ChatGPT et de modèles basés sur de puissants encodeurs de texte. Nous validons ensuite ces résultats en analysant l’impact de la mémorisation des passages de livres et d’une éventuelle contamination des annotations. Nos analyses montrent que ces formes de mémorisation n’expliquent pas l’important gain de performance, établissant ainsi Llama-3 comme le nouvel état de l’art pour l’attribution des citations dans la littérature anglaise. L’article est disponible sur le site suivant : https://aclanthology.org/ 2025.naacl-short.62/

2024

pdf bib abs

Improving Quotation Attribution with Fictional Character Embeddings
Gaspard Michel | Elena V. Epure | Romain Hennequin | Christophe Cerisara
Findings of the Association for Computational Linguistics: EMNLP 2024

Humans naturally attribute utterances of direct speech to their speaker in literary works.When attributing quotes, we process contextual information but also access mental representations of characters that we build and revise throughout the narrative. Recent methods to automatically attribute such utterances have explored simulating human logic with deterministic rules or learning new implicit rules with neural networks when processing contextual information.However, these systems inherently lack character representations, which often leads to errors in more challenging examples of attribution: anaphoric and implicit quotes.In this work, we propose to augment a popular quotation attribution system, BookNLP, with character embeddings that encode global stylistic information of characters derived from an off-the-shelf stylometric model, Universal Authorship Representation (UAR).We create DramaCV, a corpus of English drama plays from the 15th to 20th century that we automatically annotate for Authorship Verification of fictional characters utterances, and release two versions of UAR trained on DramaCV, that are tailored for literary characters analysis.Then, through an extensive evaluation on 28 novels, we show that combining BookNLP’s contextual information with our proposed global character embeddings improves the identification of speakers for anaphoric and implicit quotes, reaching state-of-the-art performance.Code and data can be found at https://github.com/deezer/character_embeddings_qa.

pdf bib abs

Distinguishing Fictional Voices: a Study of Authorship Verification Models for Quotation Attribution
Gaspard Michel | Elena Epure | Romain Hennequin | Christophe Cerisara
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)

Recent approaches to automatically detect the speaker of an utterance of direct speech often disregard general information about characters in favor of local information found in the context, such as surrounding mentions of entities. In this work, we explore stylistic representations of characters built by encoding their quotes with off-the-shelf pretrained Authorship Verification models in a large corpus of English novels (the Project Dialogism Novel Corpus). Results suggest that the combination of stylistic and topical information captured in some of these models accurately distinguish characters among each other, but does not necessarily improve over semantic-only models when attributing quotes. However, these results vary across novels and more investigation of stylometric models particularly tailored for literary texts and the study of characters should be conducted.

2023

pdf bib abs

Automatic Annotation of Direct Speech in Written French Narratives
Noé Durandard | Viet Anh Tran | Gaspard Michel | Elena Epure
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The automatic annotation of direct speech (AADS) in written text has been often used in computational narrative understanding. Methods based on either rules or deep neural networks have been explored, in particular for English or German languages. Yet, for French, our target language, not many works exist. Our goal is to create a unified framework to design and evaluate AADS models in French. For this, we consolidated the largest-to-date French narrative dataset annotated with DS per word; we adapted various baselines for sequence labelling or from AADS in other languages; and we designed and conducted an extensive evaluation focused on generalisation. Results show that the task still requires substantial efforts and emphasise characteristics of each baseline. Although this framework could be improved, it is a step further to encourage more research on the topic.

Co-authors

Venues

WS1

Fix author