Linguistic Annotation of Neo-Latin Mathematical Texts: A Pilot-Study to Improve the Automatic Parsing of the Archimedes Latinus

Margherita Fantoli; Miryam de Lhoneux

Linguistic Annotation of Neo-Latin Mathematical Texts: A Pilot-Study to Improve the Automatic Parsing of the Archimedes Latinus

Abstract

This paper describes the process of syntactically parsing the Latin translation by Jacopo da San Cassiano of the Greek mathematical work The Spirals of Archimedes. The Universal Dependencies formalism is adopted. First, we introduce the historical and linguistic importance of Jacopo da San Cassiano’s translation. Subsequently, we describe the deep Biaffine parser used for this pilot study. In particular, we motivate the choice of using the technique of treebank embeddings in light of the characteristics of mathematical texts. The paper then details the process of creation of training and test data, by highlighting the most compelling linguistic features of the text and the choices implemented in the current version of the treebank. Finally, the results of the parsing are discussed in comparison to a baseline and the most prominent errors are discussed. Overall, the paper shows the added value of creating specific training data, and of using targeted strategies (as treebank embeddings) to exploit existing annotated corpora while preserving the features of one specific text when performing syntactic parsing.

Anthology ID:: 2022.lt4hala-1.18
Volume:: Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Rachele Sprugnoli, Marco Passarotti
Venue:: LT4HALA
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 129–134
Language:
URL:: https://aclanthology.org/2022.lt4hala-1.18/
DOI:
Bibkey:
Cite (ACL):: Margherita Fantoli and Miryam de Lhoneux. 2022. Linguistic Annotation of Neo-Latin Mathematical Texts: A Pilot-Study to Improve the Automatic Parsing of the Archimedes Latinus. In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 129–134, Marseille, France. European Language Resources Association.
Cite (Informal):: Linguistic Annotation of Neo-Latin Mathematical Texts: A Pilot-Study to Improve the Automatic Parsing of the Archimedes Latinus (Fantoli & de Lhoneux, LT4HALA 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lt4hala-1.18.pdf

PDF Cite Search Fix data