Antoine Bourgois


2025

pdf bib
The Elephant in the Coreference Room: Resolving Coreference in Full-Length French Fiction Works
Antoine Bourgois | Thierry Poibeau
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference

While coreference resolution is attracting more interest than ever from computational literature researchers, representative datasets of fully annotated long documents remain surprisingly scarce. In this paper, we introduce a new annotated corpus of three full-length French novels, totaling over 285,000 tokens. Unlike previous datasets focused on shorter texts, our corpus addresses the challenges posed by long, complex literary works, enabling evaluation of coreference models in the context of long reference chains. We present a modular coreference resolution pipeline that allows for fine-grained error analysis. We show that our approach is competitive and scales effectively to long documents. Finally, we demonstrate its usefulness to infer the gender of fictional characters, showcasing its relevance for both literary analysis and downstream NLP tasks.

pdf bib
GLaRef@CRAC2025: Should we transform coreference resolution into a text generation task?
Olga Seminck | Antoine Bourgois | Yoann Dupont | Mathieu Dehouck | Marine Delaborde
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference

We present the submissions of our team to the Unconstrained and LLM tracks of the Computational Models of Reference, Anaphora and Coreference (CRAC2025) shared task, where we ended respectively in the fifth and the first place, but nevertheless with similar scores: average CoNLL-F1 scores of 61.57 and 62.96 on the test set, but with very large differences in computational cost. Indeed, the classical pair-wise resolution system submitted to the Unconstrained track obtained similar performance but with less than 10% of the computational cost. Reflecting on this fact, we point out problems that we ran into using generative AI to perform coreference resolution. We explain how the framework of text generation stands in the way of a reliable text-global coreference representation. Nonetheless, we realize there are many potential improvements of our LLM-system; we discuss them at the end of this article.