Mircea Petic


2014

pdf bib
Transliteration and alignment of parallel texts from Cyrillic to Latin
Mircea Petic | Daniela Gîfu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article describes a methodology of recovering and preservation of old Romanian texts and problems related to their recognition. Our focus is to create a gold corpus for Romanian language (the novella Sania), for both alphabets used in Transnistria ― Cyrillic and Latin. The resource is available for similar researches. This technology is based on transliteration and semiautomatic alignment of parallel texts at the level of letter/lexem/multiwords. We have analysed every text segment present in this corpus and discovered other conventions of writing at the level of transliteration, academic norms and editorial interventions. These conventions allowed us to elaborate and implement some new heuristics that make a correct automatic transliteration process. Sometimes the words of Latin script are modified in Cyrillic script from semantic reasons (for instance, editor’s interpretation). Semantic transliteration is seen as a good practice in introducing multiwords from Cyrillic to Latin. Not only does it preserve how a multiwords sound in the source script, but also enables the translator to modify in the original text (here, choosing the most common sense of an expression). Such a technology could be of interest to lexicographers, but also to specialists in computational linguistics to improve the actual transliteration standards.

2011

pdf bib
Creation and Development of the Romanian Lexical Resources
Elena Boian | Constantin Ciubotaru | Svetlana Cojocaru | Alexandru Colesnicov | Ludmila Malahov | Mircea Petic
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011