%0 Conference Proceedings %T Logical Layout Analysis Applied to Historical Newspapers %A Gutehrlé, Nicolas %A Atanassova, Iana %Y Hämäläinen, Mika %Y Alnajjar, Khalid %Y Partanen, Niko %Y Rueter, Jack %S Proceedings of the Workshop on Natural Language Processing for Digital Humanities %D 2021 %8 December %I NLP Association of India (NLPAI) %C NIT Silchar, India %F gutehrle-atanassova-2021-logical %X In recent years, libraries and archives led important digitisation campaigns that opened the access to vast collections of historical documents. While such documents are often available as XML ALTO documents, they lack information about their logical structure. In this paper, we address the problem of logical layout analysis applied to historical documents. We propose a method which is based on the study of a dataset in order to identify rules that assign logical labels to both block and lines of text from XML ALTO documents. Our dataset contains newspapers in French, published in the first half of the 20th century. The evaluation shows that our methodology performs well for the identification of first lines of paragraphs and text lines, with F1 above 0.9. The identification of titles obtains an F1 of 0.64. This method can be applied to preprocess XML ALTO documents in preparation for downstream tasks, and also to annotate large-scale datasets to train machine learning and deep learning algorithms. %U https://aclanthology.org/2021.nlp4dh-1.10 %P 85-94