Mirko Tavosanis

2024

Confronto tra Diversi Tipi di Valutazione del Miglioramento della Chiarezza di Testi Amministrativi in Lingua Italiana
Mariachiara Pascucci | Mirko Tavosanis
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

The paper presents a comparison of different types of evaluation of administrative texts in the Italian language on which a clarity improvement intervention was carried out. The clarity improvement was performed by human experts and ChatGPT. The evaluation was carried out in four different ways: by expert evaluators, used as a reference; by evaluators with good skills, subject to dedicated training; by generic evaluators recruited through a crowdsourcing platform; by ChatGPT. The results show that the closest match to the results of the evaluation by expert evaluators was reached, by a wide margin, by evaluators with good skills and dedicated training; the second best approach was reached by requesting evaluation from ChatGPT; the worst approach was reached by generic evaluators recruited through a crowdsourcing platform. Task features that may have influenced the outcome are also discussed.

2020

pdf bib

Valutazione umana di DeepL a livello di frase per le traduzioni di testi specialistici dall’inglese verso l’italiano
Mirko Tavosanis | Sirio Papa
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

2019

pdf bib

Valutazione umana di Google Traduttore e DeepL per le traduzioni di testi giornalistici dall’inglese verso l’italiano(Human evaluation of Google Translator and DeepL for translations of journalistic texts from English into Italian)
Mirko Tavosanis
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

2018

pdf bib

The ICoN Corpus of Academic Written Italian (L1 and L2)
Mirko Tavosanis | Federica Cominetti
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2012

pdf bib abs

Creation of a bottom-up corpus-based ontology for Italian Linguistics
Elisa Bianchi | Mirko Tavosanis | Emiliano Giovannetti
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the steps of construction of a shallow lexical ontology of Italian Linguistics, set to be used by a meta-search engine for query refinement. The ontology was constructed with the software Protégé 4.0.2 and is in OWL format; its construction has been carried out following the steps described in the well-known Ontology Learning From Text (OLFT) layer cake. The starting point was the automatic term extraction from a corpus of web documents concerning the domain of interest (304,000 words); as regards corpus construction, we describe the main criteria of the web documents selection and its critical points, concerning the definition of user profile and of degrees of specialisation. We describe then the process of term validation and construction of a glossary of terms of Italian Linguistics; afterwards, we outline the identification of synonymic chains and the main criteria of ontology design: top classes of ontology are Concept (containing taxonomy of concepts) and Terms (containing terms of the glossary as instances), while concepts are linked through part-whole and involved-role relation, both borrowed from Wordnet. Finally, we show some examples of the application of the ontology for query refinement.

2008

pdf bib abs

We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary application scenario for which we plan to build this resource is the automatic identification of web genres. Web genres are rather difficult to capture and to describe in their entirety, but we plan for the finished reference corpus to contain multi-level tags of the respective genre or genres a web document or a website instantiates. As the construction of such a corpus is by no means a trivial task, we discuss several alternatives that are, for the time being, mostly based on existing collections. Furthermore, we discuss a shared set of genre categories and a multi-purpose tool as two additional prerequisites for a reference corpus of web genres.

Mirko Tavosanis

2024

2020

2019

2018

2012

2008

2006

Co-authors

Venues