Davide Colla


2024

pdf bib
Legal Text Reader Profiling: Evidences from Eye Tracking and Surprisal Based Analysis
Calogero J. Scozzaro | Davide Colla | Matteo Delsanto | Antonio Mastropaolo | Enrico Mensa | Luisa Revelli | Daniele P. Radicioni
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024

Reading movements and times are a precious cue to follow reader’s strategy, and to track the underlying effort in text processing. To date, many approaches are being devised to simplify texts to overcome difficulties stemming from sentences obscure, ambiguous or deserving clarification. In the legal domain, ensuring the clarity of norms and regulations is of the utmost importance, as the full understanding of such documents lies at the foundation of core social obligations and rights. This task requires determining which utterances and text excerpts are difficult for which (sort of) reader. This investigation is the aim of the present work. We propose a preliminary study based on eye-tracking data of 61 readers, with focus on individuating different reader profiles, and on predicting reading times of our readers.

2023

pdf bib
EliCoDe at MultiGED2023: fine-tuning XLM-RoBERTa for multilingual grammatical error detection
Davide Colla | Matteo Delsanto | Elisa Di Nuovo
Proceedings of the 12th Workshop on NLP for Computer Assisted Language Learning

2020

pdf bib
LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items
Davide Colla | Enrico Mensa | Daniele P. Radicioni
Computational Linguistics, Volume 46, Issue 2 - June 2020

We present LESSLEX, a novel multilingual lexical resource. Different from the vast majority of existing approaches, we ground our embeddings on a sense inventory made available from the BabelNet semantic network. In this setting, multilingual access is governed by the mapping of terms onto their underlying sense descriptions, such that all vectors co-exist in the same semantic space. As a result, for each term we have thus the “blended” terminological vector along with those describing all senses associated to that term. LESSLEX has been tested on three tasks relevant to lexical semantics: conceptual similarity, contextual similarity, and semantic text similarity. We experimented over the principal data sets for such tasks in their multilingual and crosslingual variants, improving on or closely approaching state-of-the-art results. We conclude by arguing that LESSLEX vectors may be relevant for practical applications and for research on conceptual and lexical access and competence.

pdf bib
GruPaTo at SemEval-2020 Task 12: Retraining mBERT on Social Media and Fine-tuned Offensive Language Models
Davide Colla | Tommaso Caselli | Valerio Basile | Jelena Mitrović | Michael Granitzer
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We introduce an approach to multilingual Offensive Language Detection based on the mBERT transformer model. We download extra training data from Twitter in English, Danish, and Turkish, and use it to re-train the model. We then fine-tuned the model on the provided training data and, in some configurations, implement transfer learning approach exploiting the typological relatedness between English and Danish. Our systems obtained good results across the three languages (.9036 for EN, .7619 for DA, and .7789 for TR).