Cristina Vertan


2022

The Maya script is the only readable autochthonous writing system of the Americas and consists of more than 1000 word signs and syllables. It is only partially deciphered and is the subject of the project “Text Database and Dictionary of the Classic Maya” . Texts are recorded in TEI XML and on the basis of a digital sign and graph catalog, which are stored in the TextGrid virtual repository. Due to the state of decipherment, it is not possible to record hieroglyphic texts directly in phonemically transliterated values. The texts are therefore documented numerically using numeric sign codes based on Eric Thompson’s catalog of the Maya script. The workflow for converting numerical transliteration into textual form involves several steps, with variable solutions possible at each step. For this purpose, the authors have developed ALMAH “Annotator for the Linguistic Analysis of Maya Hieroglyphs”. The tool is a client application and allows semi-automatic generation of phonemic transliteration from numerical transliteration and enables multi-step linguistic annotation. Alternative readings can be entered, and two or more decipherment proposals can be processed in parallel. ALMAH is implemented in JAVA, is based on a graph-data model, and has a user-friendly interface.

2019

Preservation of the cultural heritage by means of digital methods became extremely popular during last years. After intensive digitization campaigns the focus moves slowly from the genuine preservation (i.e digital archiving together with standard search mechanisms) to research-oriented usage of materials available electronically. This usage is intended to go far beyond simple reading of digitized materials; researchers should be able to gain new insigts in materials, discover new facts by means of tools relying on innovative algorithms. In this article we will describe the workflow necessary for the annotation of a dichronic corpus of classical Ethiopic, language of essential importance for the study of Early Christianity
Many applications in Digital Humanities (DH) rely on annotations of the raw material. These annotations (inferred automatically or done manually) assume that labelled facts are either true or false, thus all inferences started on such annotations us boolean logic. This contradicts hermeneutic principles used by humanites in which most part of the knowledge has a degree of truth which varies depending on the experience and the world knowledge of the interpreter. In this paper we will show how uncertainty and vagueness, two main features of any historical text can be encoded in annotations and thus be considered by DH applications.

2017

Current approaches in Digital .Humanities tend to ignore a central as-pect of any hermeneutic introspection: the intrinsic vagueness of analyzed texts. Especially when dealing with his-torical documents neglecting vague-ness has important implications on the interpretation of the results. In this pa-per we present current limitation of an-notation approaches and describe a current methodology for annotating vagueness for historical Romanian texts.

2015

2014

2013

2012

Data-driven machine translation (MT) approaches became very popular during last years, especially for language pairs for which it is difficult to find specialists to develop transfer rules. Statistical (SMT) or example-based (EBMT) systems can provide reasonable translation quality for assimilation purposes, as long as a large amount of training data is available. Especially SMT systems rely on parallel aligned corpora which have to be statistical relevant for the given language pair. The construction of large domain specific parallel corpora is time- and cost-consuming; the current practice relies on one or two big such corpora per language pair. Recent developed strategies ensure certain portability to other domains through specialized lexicons or small domain specific corpora. In this paper we discuss the influence of different discourse styles on statistical machine translation systems. We investigate how a pure SMT performs when training and test data belong to same domain but the discourse style varies.

2011

2010

During the last years the campaign of mass digitization made available catalogues and valuable rare manuscripts and old printed books vie the Internet. The Manuscriptorium digital library ingested hundreds of olumes and it is expected that the volume will grow up in the next years. Other European initiatives like Europeana and Monasterium have also as central activities the online presentation of cultural heritage. With the growing of the available on-line volumes, a special attention was paid to the management and retrieval of documents within digital libraries. Enabling semantic technologies and intelligent linking and search are a big step forward, but they still do not succeed in making the content of old rare books intelligible to the broad public or specialists in other domains or languages. In this paper we will argue that multilingual language technologies have the potential to fill this gap. We overview the existent language resources for historical documents, and present an architecture which aims at presenting such texts to the normal user, without altering the character of the texts.

2009

2007

2005

Natural Language is considered the friendliest way of man-machine communication. However the implementation of natural language interfaces faces often the problem of lack of linguistic and world-knowledge, especially when the application domain is not very specific. This is exactly the case of Web-based applications, which aim to serve for retrieval of information in every-day areas of work. The recent Semantic Web activities had as consequence the development of large ontologies for a broad spectrum of domains, as well as of mechanisms for annotating the resources with semantic information. In this paper we present a new architecture aiming to bring together the advantages of natural language querying and the power of semantic W eb. W e will show also how described application can be easily adapted for other domains.
In this paper we give an overview of Semantic Web technologies and the impact of these ones for multilingual Web. We present a possible solution for improving the quality of on-line translation systems, using mechanisms and standards from Semantic Web. We focus on Example based machine translation and the automatization of the translation examples extraction by means of RDF-repositories.

2004

2003

Implementation of machine translation “toy” systems is a good practical exercise especially for computer science students. Our aim in a series of courses on MT in 2002 was to make students familiar both with typical problems of Machine Translation in particular and natural language processing in general, as well as with software implementation. In order to simulate a software implementation proc- ess as realistic as possible, we introduced more than 20 evaluation criteria to be filled by the students when they evaluated their own products. The criteria go far beyond such “toy” systems, but they should demonstrate the students, what a real software evaluation means, and which are the particularities of Machine Translation Evaluation.

2002