Francesco Mambrini

2025

DynaMorphPro: A New Diachronic and Multilingual Lexical Resource in the LLOD ecosystem
Matteo Pellegrini | Valeria Irene Boano | Francesco Gardani | Francesco Mambrini | Giovanni Moretti | Marco Carlo Passarotti
Proceedings of the 5th Conference on Language, Data and Knowledge

This paper describes the release as Linguistic Linked Open Data of DynaMorphPro, a lexical resource recording loanwords, conversions and class-shifts from Latin to Old Italian. We show how existing vocabularies are reused and integrated to allow for a rich semantic representation of these data. Our main reference is the OntoLex-lemon model for lexical information, but classes and properties from many other ontologies are also reused to express other aspects. In particular, we identify the CIDOC Concept Reference Model as the ideal tool to convey chronological information on historical processes of lexical innovation and change, and describe how it can be integrated with OntoLex-lemon.

pdf bib abs

The Leibniz List as Linguistic Linked Data in the LiLa Knowledge Base
Lisa Sophie Albertelli | Giulia Calvi | Francesco Mambrini
Proceedings of the 5th Conference on Language, Data and Knowledge

This paper presents the integration of the Leibniz List, a concept list from the Concepticon project, into the LiLa Knowledge Base of Latin interoperable resources. The modeling experiment was conducted using W3C standards like Ontolex and SKOS. This work, which originated in a project for a university course, is limited to a short list of words, but it already enables interoperability between the Concepticon and the language resources in a LOD architecture like LiLa. The integration enriches the LiLa ecosystem, allowing users to explore Latin lexicon from an onomasiological perspective and links concepts to lexical entries from various dictionaries and corpus attestations. The work showcases how standard Semantic Web technologies can effectively model and connect historical concept lists within larger linguistic knowledge infrastructures and provides an example for further experiments with the Concepticon’s data.

pdf bib

2024

pdf bib abs

Exploring Neural Topic Modeling on a Classical Latin Corpus
Ginevra Martinelli | Paola Impicciché | Elisabetta Fersini | Francesco Mambrini | Marco Passarotti
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The large availability of processable textual resources for Classical Latin has made it possible to study Latin literature through methods and tools that support distant reading. This paper describes a number of experiments carried out to test the possibility of investigating the thematic distribution of the Classical Latin corpus Opera Latina by means of topic modeling. For this purpose, we train, optimize and compare two neural models, Product-of-Experts LDA (ProdLDA) and Embedded Topic Model (ETM), opportunely revised to deal with the textual data from a Classical Latin corpus, to evaluate which one performs better both on the basis of topic diversity and topic coherence metrics, and from a human judgment point of view. Our results show that the topics extracted by neural models are coherent and interpretable and that they are significant from the perspective of a Latin scholar. The source code of the proposed model is available at https://github.com/MIND-Lab/LatinProdLDA.

pdf bib abs

The Services of the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin
Marco Passarotti | Francesco Mambrini | Giovanni Moretti
Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024

This paper describes three online services designed to ease the tasks of querying and populating the linguistic resources for Latin made interoperable through their publication as Linked Open Data in the LiLa Knowledge Base. As for querying the KB, we present an interface to search the collection of lemmas that represents the core of the Knowledge Base, and an interactive, graphical platform to run queries on the resources currently interlinked. As for populating the KB with new textual resources, we describe a tool that performs automatic tokenization, lemmatization and Part-of-Speech tagging of a raw text in Latin and links its tokens to LiLa.

pdf bib abs

Modelling and Linking an Old Latin-Portuguese Dictionary to the LiLa Knowledge Base
Lucas Consolin Dezotti | Marco Passarotti | Francesco Mambrini
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper describes the steps undertaken to include data from Antonio Velez’s bilingual Latin-Portuguese dictionary (Index Totius Artis, 1744) into the LiLa Knowledge Base of interoperable linguistic resources for Latin. The paper focuses on how the lexical and lexicographic information of the source dictionary was modelled by using respectively the Lexicon Model for Ontologies (OntoLex-lemon) and its lexicog module. The linking process of the dictionary entries with those of the LiLa collection of Latin lemmas is detailed, discussing issues in dealing with ambiguities and typographical errors found in the source. The result is the first Latin-Portuguese lexical resource made interoperable with the (meta)data of the other linguistic resources for Latin interlinked in the LiLa Knowledge Base, providing new ways of assessing the dictionary information or using its content as starting point to explore the connections with other interlinked linguistic resources. A couple of use case scenarios illustrate those possibilities.

pdf bib abs

Lifeless Winter without Break: Ovid’s Exile Works and the LiLa Knowledge Base
Aurora Alagni | Francesco Mambrini | Marco Passarotti
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

In this paper we describe the process of semi-automatic annotation and linking performed to connect two works by the Latin poet Ovid to the LiLa Knowledge Base of interoperable linguistic resources. Written after Ovid’s exile from Rome, the Tristia and the Epistulae ex Ponto mark the beginning of the “literature of exile”. In spite of their importance, no lemmatized version existed and the two collections were not part of the major annotated corpora linked to LiLa. The paper discusses the workflow used to annotate and publish the works as Linked Open Data connected to the LiLa Knowledge Base. On account of their subject and the emotional tone attached to the theme of exile, the two works are particularly relevant for sentiment analysis. We discuss some results of a lexicon-based analysis that is enabled by the interlinking with LiLa. We use LatinAffectus, a manually-generated polarity lexicon for Latin nouns and adjectives, to perform Sentiment Analysis on the aforementioned works and interpret the (replicable) results by consulting and simultaneously enriching the available literary scholarship with new information.

2023

pdf bib

Modelling and Publishing the “Lexicon der indogermanischen Verben” as Linked Open Data
Valeria Irene Boano | Francesco Mambrini | Marco Passarotti | Riccardo Ginevra
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

2022

pdf bib abs

The Index Thomisticus Treebank as Linked Data in the LiLa Knowledge Base
Francesco Mambrini | Marco Passarotti | Giovanni Moretti | Matteo Pellegrini
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Although the Universal Dependencies initiative today allows for cross-linguistically consistent annotation of morphology and syntax in treebanks for several languages, syntactically annotated corpora are not yet interoperable with many lexical resources that describe properties of the words that occur therein. In order to cope with such limitation, we propose to adopt the principles of the Linguistic Linked Open Data community, to describe and publish dependency treebanks as LLOD. In particular, this paper illustrates the approach pursued in the LiLa Knowledge Base, which enables interoperability between corpora and lexical resources for Latin, to publish as Linguistic Linked Open Data the annotation layers of two versions of a Medieval Latin treebank (the Index Thomisticus Treebank).

pdf bib abs

Linking the LASLA Corpus in the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin
Margherita Fantoli | Marco Passarotti | Francesco Mambrini | Giovanni Moretti | Paolo Ruffolo
Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference

This paper describes the process of interlinking the 130 Classical Latin texts provided by an annotated corpus developed at the LASLA laboratory with the LiLa Knowledge Base, which makes linguistic resources for Latin interoperable by following the principles of the Linked Data paradigm and making reference to classes and properties of widely adopted ontologies to model the relevant information. After introducing the overall architecture of the LiLa Knowledge Base and the LASLA corpus, the paper details the phases of the process of linking the corpus with the collection of lemmas of LiLa and presents a federated query to exemplify the added value of interoperability of LASLA’s texts with other resources for Latin.

2021

pdf bib

Sentiment Analysis of Latin Poetry: First Experiments on the Odes of Horace
Rachele Sprugnoli | Francesco Mambrini | Marco Passarotti | Giovanni Moretti
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

pdf bib

Linking the Lewis & Short Dictionary to the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin
Francesco Mambrini | Eleonora Litta | Marco Passarotti | Paolo Ruffolo
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

2020

pdf bib

Græcissare: Ancient Greek Loanwords in the LiLa Knowledge Base of Linguistic Resources for Latin
Greta Franzini | Federica Zampedri | Marco Passarotti | Francesco Mambrini | Giovanni Moretti
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

pdf bib abs

Representing Etymology in the LiLa Knowledge Base of Linguistic Resources for Latin
Francesco Mambrini | Marco Passarotti
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

In this paper we describe the process of inclusion of etymological information in a knowledge base of interoperable Latin linguistic resources developed in the context of the LiLa: Linking Latin project. Interoperability is obtained by applying the Linked Open Data principles. Particularly, an extensive collection of Latin lemmas is used to link the (distributed) resources. For the etymology, we rely on the Ontolex-lemon ontology and the lemonEty extension to model the information, while the source data are taken from a recent etymological dictionary of Latin. As a result, the collection of lemmas LiLa is built around now includes 1,465 Proto-Italic and 1,393 Proto-Indo-European reconstructed forms that are used to explain the history of 1,400 Latin words. We discuss the motivation, methodology and modeling strategies of the work, as well as its possible applications and potential future developments.

2019

pdf bib

Linked Open Treebanks. Interlinking Syntactically Annotated Corpora in the LiLa Knowledge Base of Linguistic Resources for Latin
Francesco Mambrini | Marco Passarotti
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

pdf bib abs

Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for Latin
Francesco Mambrini | Marco Passarotti
Proceedings of the 13th Linguistic Annotation Workshop

The interoperability between lemmatized corpora of Latin and other resources that use the lemma as indexing key is hampered by the multiple lemmatization strategies that different projects adopt. In this paper we discuss how we tackle the challenges raised by harmonizing different lemmatization criteria in the context of a project that aims to connect linguistic resources for Latin using the Linked Data paradigm. The paper introduces the architecture supporting an open-ended, lemma-based Knowledge Base, built to make textual and lexical resources for Latin interoperable. Particularly, the paper describes the inclusion into the Knowledge Base of its lexical basis, of a word formation lexicon and of a lemmatized and syntactically annotated corpus.

pdf bib

The Treatment of Word Formation in the LiLa Knowledge Base of Linguistic Resources for Latin
Eleonora Litta | Marco Passarotti | Francesco Mambrini
Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology

2018

pdf bib

The iDAI.publication: Extracting and Linking Information in the Publications of the German Archaeological Institute (DAI)
Francesco Mambrini
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

2013

pdf bib

Non-Projectivity in the Ancient Greek Dependency Treebank
Francesco Mambrini | Marco Passarotti
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

2012

pdf bib abs

First Steps towards the Semi-automatic Development of a Wordformation-based Lexicon of Latin
Marco Passarotti | Francesco Mambrini
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Although lexicography of Latin has a long tradition dating back to ancient grammarians, and almost all Latin grammars devote to wordformation at least one part of the section(s) concerning morphology, none of the today available lexical resources and NLP tools of Latin feature a wordformation-based organization of the Latin lexicon. In this paper, we describe the first steps towards the semi-automatic development of a wordformation-based lexicon of Latin, by detailing several problems occurring while building the lexicon and presenting our solutions. Developing a wordformation-based lexicon of Latin is nowadays of outmost importance, as the last years have seen a large growth of annotated corpora of Latin texts of different eras. While these corpora include lemmatization, morphological tagging and syntactic analysis, none of them features segmentation of the word forms and wordformation relations between the lexemes. This restricts the browsing and the exploitation of the annotated data for linguistic research and NLP tasks, such as information retrieval and heuristics in PoS tagging of unknown words.