SubmissionNumber#=%=#39
FinalPaperTitle#=%=#Detecting reported speech as a token classification task: an application to Classical Latin?
ShortPaperTitle#=%=#
NumberOfPages#=%=#6
CopyrightSigned#=%=#Agustin Dei
JobTitle#==#
Organization#==#Sorbonne Université 
21, rue de l'École de Médecine
75006 Paris
Abstract#==#This paper presents the first application of an automatic token-classification approach for detecting reported speech spans in Classical Latin using transformer-based neural architectures.
Focusing on Seneca the Elder's Declamatory Anthology, the study addresses the text's highly polyphonic nature, resulting from the
use of reported speech. Instead of relying exclusively on sentence-level syntactic information, the proposed approach treats reported speech detection as a token-level sequence labeling problem. This enables the identification of reported speech spans extending across multiple sentences. We fine-tune three Latin neural language models —LatinBERT, LaBERTa, and PhilBERTa— for binary token-level classification and conduct experiments both with and without punctuation. The results show that RoBERTa-based models effectively identify reported speech, with LaBERTa achieving the best performance (F1 scores above 0.90).
Author{1}{Firstname}#=%=#Agustin
Author{1}{Lastname}#=%=#Dei
Author{1}{Orcid}#=%=#
Author{1}{Email}#=%=#agustin.dei@sorbonne-universite.fr
Author{1}{Affiliation}#=%=#Sorbonne Université

==========
èéáğö