2024
pdf
bib
abs
DigItAnt: a platform for creating, linking and exploiting LOD lexica with heterogeneous resources
Michele Mallia
|
Michela Bandini
|
Andrea Bellandi
|
Francesca Murano
|
Silvia Piccini
|
Luca Rigobianco
|
Alessandro Tommasi
|
Cesare Zavattari
|
Mariarosaria Zinzi
|
Valeria Quochi
Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024
Over the past few years, the deployment of Linked Open Data (LOD) technologies has witnessed significant advancements across a myriad of sectors, linguistics included. This progression is characterized by an exponential increase in the conversion of resources to adhere to contemporary encoding standards. Such transformations are driven by the objectives outlined in “ecological” methodologies, notably the FAIR data principles, which advocate for the reuse and interoperability of resources. This paper presents the DigItAnt architecture, developed in the context of a national project funded by the Italian Ministry of Research and in the service of a recently started Italian endeavor to realize a federation of infrastructures for the humanities. It details its services, utilities and data types, and shows how it manages to produce, exploit and interlink LLOD and non-LLOD datasets in ways that are meaningful to its intended target disciplinary context, i.e. historical linguistics over epigraphy data. The paper also introduces how DigItAnt services and functionalities will contribute to the empowerment of the H2IOSC Italian infrastructures cluster project, which is devoted to the construction of a nationwide research infrastructure federation for the humanities, and it will possibly contribute to its pilot project towards an authoritative LLOD platform.
pdf
bib
abs
Tracing Linguistic Heritage: Constructing a Somali-Italian Terminological Resource through Explorers’ Notebooks and Contemporary Corpus Analysis
Silvia Piccini
|
Giuliana Elizabeth Vilela Ruiz
|
Andrea Bellandi
|
Enrico Carniani
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
The aim of this contribution is to introduce the initial phases of constructing a Somali-Italian terminological resource that dates back to Italy’s colonial expansion into Africa. Specifically, the terminological data was extracted from the notebooks authored by the Italian explorer Ugo Ferrandi (1852 - 1928) and published by the Società Geografica in 1903 under the title “Lugh. Emporio Commerciale sul Giuba”. In order to develop Ferrandi’s terminological resource, we have employed Semantic Web technologies (RDF, OWL, and SPARQL) and embraced the Linked Open Data paradigm. This ensures the FAIRness of the data and enables the publication and sharing of our terminological resource within an open interconnected Web of Data, thus contributing to addressing the absence of Somali in the Linguistic Linked Data cloud. Whenever feasible, Ferrandi’s lexicon entries have been linked and enriched with information derived from a Somali lexicon included in a contemporary Somali Corpus. This approach allows the synchronic corpus-related Somali lexicon to acquire historical depth, thereby illuminating the linguistic dynamics that have transpired over time and would otherwise have remained obscure.
2023
pdf
bib
The Importance of Being Interoperable: Theoretical and Practical Implications in Converting TBX to OntoLex-Lemon
Andrea Bellandi
|
Giorgio Maria Di Nunzio
|
Silvia Piccini
|
Federica Vezzani
Proceedings of the 4th Conference on Language, Data and Knowledge
2022
pdf
bib
abs
From Inscriptions to Lexica and Back: A Platform for Editing and Linking the Languages of Ancient Italy
Valeria Quochi
|
Andrea Bellandi
|
Fahad Khan
|
Michele Mallia
|
Francesca Murano
|
Silvia Piccini
|
Luca Rigobianco
|
Alessandro Tommasi
|
Cesare Zavattari
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
Available language technology is hardly applicable to scarcely attested ancient languages, yet their digital semantic representation, though challenging, is an asset for the purpose of sharing and preserving existing cultural knowledge. In the context of a project on the languages and cultures of ancient Italy, we took up this challenge. The paper thus describes the development of a user friendly web platform, EpiLexO, for the creation and editing of an integrated system of language resources for ancient fragmentary languages centered on the lexicon, in compliance with current digital humanities and Linked Open Data principles. EpiLexo allows for the editing of lexica with all relevant cross-references: for their linking to their testimonies, as well as to bibliographic information and other (external) resources and common vocabularies. The focus of the current implementation is on the languages of ancient Italy, in particular Oscan, Faliscan, Celtic and Venetic; however, the technological solutions are designed to be general enough to be potentially applicable to different scenarios.
2017
pdf
bib
Developing LexO: a Collaborative Editor of Multilingual Lexica and Termino-Ontological Resources in the Humanities
Andrea Bellandi
|
Emiliano Giovannetti
|
Silvia Piccini
|
Anja Weingart
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)
2014
pdf
bib
abs
Sharing Cultural Heritage: the Clavius on the Web Project
Matteo Abrate
|
Angelo Mario Del Grosso
|
Emiliano Giovannetti
|
Angelica Lo Duca
|
Damiana Luzzi
|
Lorenzo Mancini
|
Andrea Marchetti
|
Irene Pedretti
|
Silvia Piccini
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In the last few years the amount of manuscripts digitized and made available on the Web has been constantly increasing. However, there is still a considarable lack of results concerning both the explicitation of their content and the tools developed to make it available. The objective of the Clavius on the Web project is to develop a Web platform exposing a selection of Christophorus Clavius letters along with three different levels of analysis: linguistic, lexical and semantic. The multilayered annotation of the corpus involves a XML-TEI encoding followed by a tokenization step where each token is univocally identified through a CTS urn notation and then associated to a part-of-speech and a lemma. The text is lexically and semantically annotated on the basis of a lexicon and a domain ontology, the former structuring the most relevant terms occurring in the text and the latter representing the domain entities of interest (e.g. people, places, etc.). Moreover, each entity is connected to linked and non linked resources, including DBpedia and VIAF. Finally, the results of the three layers of analysis are gathered and shown through interactive visualization and storytelling techniques. A demo version of the integrated architecture was developed.