LiViTo: Linguistic and Visual Features Tool for Assisted Analysis of Historic Manuscripts
Klaus Müller | Aleksej Tikhonov | Roland Meyer
Proceedings of the Twelfth Language Resources and Evaluation Conference
We propose a mixed methods approach to the identification of scribes and authors in handwritten documents, and present LiViTo, a software tool which combines linguistic insights and computer vision techniques in order to assist researchers in the analysis of handwritten historical documents. Our research shows that it is feasible to train neural networks for the automatic transcription of handwritten documents and to use these transcriptions as input for further learning processes. Hypotheses about scribes can be tested effectively by extracting visual handwriting features and clustering them appropriately. Methods from linguistics and from computer vision research integrate into a mixed methods system, with benefits on both sides. LiViTo was trained with historical Czech texts by 18th century immigrants to Berlin, a total of 564 pages from a corpus of about 5000 handwritten pages without indication of author or scribe. We provide an overview of the three-year development of LiViTo and an introduction into its methodology and its functions. We then present our findings concerning the corpus of Berlin Czech manuscripts and discuss possible further usage scenarios.