Marie-Laurence Bonhomme


2020

pdf bib
Books of Hours. the First Liturgical Data Set for Text Segmentation.
Amir Hazem | Beatrice Daille | Christopher Kermorvant | Dominique Stutzmann | Marie-Laurence Bonhomme | Martin Maarand | Mélodie Boillet
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Book of Hours was the bestseller of the late Middle Ages and Renaissance. It is a historical invaluable treasure, documenting the devotional practices of Christians in the late Middle Ages. Up to now, its textual content has been scarcely studied because of its manuscript nature, its length and its complex content. At first glance, it looks too standardized. However, the study of book of hours raises important challenges: (i) in image analysis, its often lavish ornamentation (illegible painted initials, line-fillers, etc.), abbreviated words, multilingualism are difficult to address in Handwritten Text Recognition (HTR); (ii) its hierarchical entangled structure offers a new field of investigation for text segmentation; (iii) in digital humanities, its textual content gives opportunities for historical analysis. In this paper, we provide the first corpus of books of hours, which consists of Latin transcriptions of 300 books of hours generated by Handwritten Text Recognition (HTR) - that is like Optical Character Recognition (OCR) but for handwritten and not printed texts. We designed a structural scheme of the book of hours and annotated manually two books of hours according to this scheme. Lastly, we performed a systematic evaluation of the main state of the art text segmentation approaches.

2019

pdf bib
Transcription automatique et segmentation thématique de livres d’heures manuscrits [Automatic transcription and thematic segmentation of Books of Hours]
Béatrice Daille | Amir Hazem | Christopher Kermorvant | Martin Maarand | Marie-Laurence Bonhomme | Dominique Stutzmann | Jacob Currie | Christine Jacquin
Traitement Automatique des Langues, Volume 60, Numéro 3 : TAL et humanités numériques [NLP and Digital Humanities]