SubmissionNumber#=%=#24
FinalPaperTitle#=%=#Catalogues as Data: Interpretable NLP Pipelines for Ottoman-Turkish Bibliographies
ShortPaperTitle#=%=#
NumberOfPages#=%=#7
CopyrightSigned#=%=#MJH
JobTitle#==#
Organization#==#
Abstract#==#Bibliographies are both humanities infrastructure and historic record. To computationally analyse them, however, requires implementing complex digitisation and standardisation decisions. This paper turns to Seyfettin Özege's Eski Harflerle Basılmış Türkçe Eserler Kataloğu as an example, a scanned set of volumes marked by complex page layouts, degraded typography, irregular entry structures, and historically contingent inconsistencies. With this we present a pipeline that constructs a structured, machine-readable, and analysable dataset out of the 27,000 entries with computer vision, OCR, large and visual language models, sequence-based validation, and custom review tools. This process captures 97.8% of records, with remaining cases capable of being addressed by targeted review. This process demonstrates that combining LLMs with interpretable, review-centric pipelines, offers an appropriate approach for historically complex bibliographic sources.
Author{1}{Firstname}#=%=#Mark J.
Author{1}{Lastname}#=%=#Hill
Author{1}{Username}#=%=#mark.j.hill
Author{1}{Orcid}#=%=#0000-0001-7273-1775
Author{1}{Email}#=%=#mark.j.hill@kcl.ac.uk
Author{1}{Affiliation}#=%=#King's College London
Author{2}{Firstname}#=%=#Ayse
Author{2}{Lastname}#=%=#Bulus
Author{2}{Orcid}#=%=#
Author{2}{Email}#=%=#ayse.bulus@kcl.ac.uk
Author{2}{Affiliation}#=%=#King's College London
Author{3}{Firstname}#=%=#Paul
Author{3}{Lastname}#=%=#Spence
Author{3}{Orcid}#=%=#
Author{3}{Email}#=%=#paul.spence@kcl.ac.uk
Author{3}{Affiliation}#=%=#King's College London

==========
èéáğö