Elena Chiocchetti


2024

pdf bib
Training an NMT system for legal texts of a low-resource language variety South Tyrolean German - Italian
Antoni Oliver | Sergi Alvarez-Vidal | Egon Stemle | Elena Chiocchetti
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

This paper illustrates the process of training and evaluating NMT systems for a language pair that includes a low-resource language variety.A parallel corpus of legal texts for Italian and South Tyrolean German has been compiled, with South Tyrolean German being the low-resourced language variety. As the size of the compiled corpus is insufficient for the training, we have combined the corpus with several parallel corpora using data weighting at sentence level. We then performed an evaluation of each combination and of two popular commercial systems.

2023

pdf bib
The MT@BZ corpus: machine translation & legal language
Flavia De Camillis | Egon W. Stemle | Elena Chiocchetti | Francesco Fernicola
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

The paper reports on the creation, annotation and curation of the MT@BZ corpus, a bilingual (Italian–South Tyrolean German) corpus of machine-translated legal texts from the officially multilingual Province of Bolzano, Italy. It is the first human error-annotated corpus (using an adapted SCATE taxonomy) of machine-translated legal texts in this language combination that includes a lesser-used standard variety. The data of the project will be made available on GitHub and another repository. The output of the customized engine achieved notably better BLEU, TER and chrF2 scores than the baseline. Over 50% of the segments needed no human revision due to customization. The most frequent error categories were mistranslations and bilingual (legal) terminology errors. Our contribution brings fine-grained insights to Machine translation evaluation research, as it concerns a less common language combination, a lesser-used language variety and a societally relevant specialized domain. Such results are necessary to implement and inform the use of MT in institutional contexts of smaller language communities.

2006

pdf bib
Multilingual Legal Terminology on the Jibiki Platform: The LexALP Project
Gilles Sérasset | Francis Brunet-Manquat | Elena Chiocchetti
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
The LexALP Information System: Term Bank and Corpus for Multilingual Legal Terminology Consolidated
Verena Lyding | Elena Chiocchetti | Gilles Sérasset | Francis Brunet-Manquat
Proceedings of the Workshop on Multilingual Language Resources and Interoperability