2024
pdf
bib
abs
Legal Text Reader Profiling: Evidences from Eye Tracking and Surprisal Based Analysis
Calogero J. Scozzaro
|
Davide Colla
|
Matteo Delsanto
|
Antonio Mastropaolo
|
Enrico Mensa
|
Luisa Revelli
|
Daniele P. Radicioni
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024
Reading movements and times are a precious cue to follow reader’s strategy, and to track the underlying effort in text processing. To date, many approaches are being devised to simplify texts to overcome difficulties stemming from sentences obscure, ambiguous or deserving clarification. In the legal domain, ensuring the clarity of norms and regulations is of the utmost importance, as the full understanding of such documents lies at the foundation of core social obligations and rights. This task requires determining which utterances and text excerpts are difficult for which (sort of) reader. This investigation is the aim of the present work. We propose a preliminary study based on eye-tracking data of 61 readers, with focus on individuating different reader profiles, and on predicting reading times of our readers.
2023
pdf
bib
abs
WikiBio: a Semantic Resource for the Intersectional Analysis of Biographical Events
Marco Antonio Stranisci
|
Rossana Damiano
|
Enrico Mensa
|
Viviana Patti
|
Daniele Radicioni
|
Tommaso Caselli
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Biographical event detection is a relevant task that allows for the exploration and comparison of the ways in which people’s lives are told and represented. This may support several real-life applications in digital humanities and in works aimed at exploring bias about minoritized groups. Despite that, there are no corpora and models specifically designed for this task. In this paper we fill this gap by presenting a new corpus annotated for biographical event detection. The corpus, which includes 20 Wikipedia biographies, was aligned with 5 existing corpora in order to train a model for the biographical event detection task. The model was able to detect all mentions of the target-entity in a biography with an F-score of 0.808 and the entity-related events with an F-score of 0.859. Finally, the model was used for performing an analysis of biases about women and non-Western people in Wikipedia biographies.
2022
pdf
bib
abs
Guidelines and a Corpus for Extracting Biographical Events
Marco Antonio Stranisci
|
Enrico Mensa
|
Rossana Damiano
|
Daniele Radicioni
|
Ousmane Diakite
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022
Despite biographies are widely spread within the Semantic Web, resources and approaches to automatically extract biographical events are limited. Such limitation reduces the amount of structured, machine-readable biographical information, especially about people belonging to underrepresented groups. Our work challenges this limitation by providing a set of guidelines for the semantic annotation of life events. The guidelines are designed to be interoperable with existing ISO-standards for semantic annotation: ISO-TimeML (SO-24617-1), and SemAF (ISO-24617-4). Guidelines were tested through an annotation task of Wikipedia biographies of underrepresented writers, namely authors born in non-Western countries, migrants, or belonging to ethnic minorities. 1,000 sentences were annotated by 4 annotators with an average Inter-Annotator Agreement of 0.825. The resulting corpus was mapped on OntoNotes. Such mapping allowed to to expand our corpus, showing that already existing resources may be exploited for the biographical event extraction task.
2020
pdf
bib
abs
LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items
Davide Colla
|
Enrico Mensa
|
Daniele P. Radicioni
Computational Linguistics, Volume 46, Issue 2 - June 2020
We present LESSLEX, a novel multilingual lexical resource. Different from the vast majority of existing approaches, we ground our embeddings on a sense inventory made available from the BabelNet semantic network. In this setting, multilingual access is governed by the mapping of terms onto their underlying sense descriptions, such that all vectors co-exist in the same semantic space. As a result, for each term we have thus the “blended” terminological vector along with those describing all senses associated to that term. LESSLEX has been tested on three tasks relevant to lexical semantics: conceptual similarity, contextual similarity, and semantic text similarity. We experimented over the principal data sets for such tasks in their multilingual and crosslingual variants, improving on or closely approaching state-of-the-art results. We conclude by arguing that LESSLEX vectors may be relevant for practical applications and for research on conceptual and lexical access and competence.
2017
pdf
bib
abs
MERALI at SemEval-2017 Task 2 Subtask 1: a Cognitively Inspired approach
Enrico Mensa
|
Daniele P. Radicioni
|
Antonio Lieto
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
In this paper we report on the participation of the MERALI system to the SemEval Task 2 Subtask 1. The MERALI system approaches conceptual similarity through a simple, cognitively inspired, heuristics; it builds on a linguistic resource, the TTCS-e, that relies on BabelNet, NASARI and ConceptNet. The linguistic resource in fact contains a novel mixture of common-sense and encyclopedic knowledge. The obtained results point out that there is ample room for improvement, so that they are used to elaborate on present limitations and on future steps.
pdf
bib
abs
TTCSℰ: a Vectorial Resource for Computing Conceptual Similarity
Enrico Mensa
|
Daniele P. Radicioni
|
Antonio Lieto
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications
In this paper we introduce the TTCSℰ, a linguistic resource that relies on BabelNet, NASARI and ConceptNet, that has now been used to compute the conceptual similarity between concept pairs. The conceptual representation herein provides uniform access to concepts based on BabelNet synset IDs, and consists of a vector-based semantic representation which is compliant with the Conceptual Spaces, a geometric framework for common-sense knowledge representation and reasoning. The TTCSℰ has been evaluated in a preliminary experimentation on a conceptual similarity task.