LiViTo: Linguistic and Visual Features Tool for Assisted Analysis of Historic Manuscripts

Klaus Müller, Aleksej Tikhonov, Roland Meyer


Abstract
We propose a mixed methods approach to the identification of scribes and authors in handwritten documents, and present LiViTo, a software tool which combines linguistic insights and computer vision techniques in order to assist researchers in the analysis of handwritten historical documents. Our research shows that it is feasible to train neural networks for the automatic transcription of handwritten documents and to use these transcriptions as input for further learning processes. Hypotheses about scribes can be tested effectively by extracting visual handwriting features and clustering them appropriately. Methods from linguistics and from computer vision research integrate into a mixed methods system, with benefits on both sides. LiViTo was trained with historical Czech texts by 18th century immigrants to Berlin, a total of 564 pages from a corpus of about 5000 handwritten pages without indication of author or scribe. We provide an overview of the three-year development of LiViTo and an introduction into its methodology and its functions. We then present our findings concerning the corpus of Berlin Czech manuscripts and discuss possible further usage scenarios.
Anthology ID:
2020.lrec-1.111
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
885–890
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.111
DOI:
Bibkey:
Cite (ACL):
Klaus Müller, Aleksej Tikhonov, and Roland Meyer. 2020. LiViTo: Linguistic and Visual Features Tool for Assisted Analysis of Historic Manuscripts. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 885–890, Marseille, France. European Language Resources Association.
Cite (Informal):
LiViTo: Linguistic and Visual Features Tool for Assisted Analysis of Historic Manuscripts (Müller et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.111.pdf