Tommaso Caselli - ACL Anthology

Tommaso Caselli

2026

Lexical Popularity: Quantifying the Impact of Pre-training for LLM Performance
Elena Sofia Ruzzetti | Fabio Massimo Zanzotto | Tommaso Caselli
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) excel in numerous and varied tasks. Yet, the mechanisms that underlie this success remain insufficiently understood. In particular, the size and the limited transparency of their pre-training materials make it difficult to state what the properties of the pre-training material are when compared to the test data. In this paper, we investigate whether LLMs learned generalized linguistic abstraction or rely on surface-level features, like lexical patterns, that match their pre-training data. We explore this by examining the relationship between lexical overlap of test data and task performance. We observe that lexical overlap with the pre-training material is mostly beneficial to model performance on tasks requiring functional linguistic knowledge. To further explore the impact of lexical features, we also demonstrate that LLMs are fragile with respect to lexical perturbations that preserve semantics. While we expected models to rely on lexical overlap between test instances and pre-training data for tasks requiring functional knowledge, lexical perturbations reveal that models also exhibit, to a lesser extent, this dependence for tasks requiring formal linguistic knowledge.

2025

HODIAT: A Dataset for Detecting Homotransphobic Hate Speech in Italian with Aggressiveness and Target Annotation
Greta Damo | Alessandra Teresa Cignarella | Tommaso Caselli | Viviana Patti | Debora Nozza
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)

The escalating spread of homophobic and transphobic rhetoric in both online and offline spaces has become a growing global concern, with Italy standing out as one of the countries where acts of violence against LGBTQIA+ individuals persist and increase year after year. This short paper study analyzes hateful language against LGBTQIA+ individuals in Italian using novel annotation labels for aggressiveness and target. We assess a range of multilingual and Italian language models on this newannotation layers across zero-shot, few-shot, and fine-tuning settings. The results reveal significant performance gaps across models and settings, highlighting the limitations of zero- and few-shot approaches and the importance of fine-tuning on labelled data, when available, to achieve high prediction performance.

TEXT-CAKE: Challenging Language Models on Local Text Coherence
Luca Dini | Dominique Brunato | Felice Dell’Orletta | Tommaso Caselli
Proceedings of the 31st International Conference on Computational Linguistics

We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller.

The “r” in “woman” stands for rights. Auditing LLMs in Uncovering Social Dynamics in Implicit Misogyny
Arianna Muti | Chris Emmery | Debora Nozza | Alberto Barrón-Cedeño | Tommaso Caselli
Findings of the Association for Computational Linguistics: EMNLP 2025

Persistent societal biases like misogyny express themselves more often implicitly than through openly hostile language.However, previous misogyny studies have focused primarily on explicit language, overlooking these more subtle forms. We bridge this gap by examining implicit misogynistic expressions in English and Italian. First, we develop a taxonomy of social dynamics, i.e., the underlying communicative intent behind misogynistic statements in social media data. Then, we test the ability of nine LLMs to identify the social dynamics as a multi-label classification and text span selection: first LLMs must choose social dynamics given a prefixed list, then they have to explicitly identify the text spans that triggered their decisions. We also investigate the extent of using different learning settings: zero and few-shot, and prescriptive. Our analysis suggests that LLMs struggle to follow instructions and reason in all settings, mostly relying on semantic associations, recasting claims of emergent abilities.

Learning from Disagreement: Entropy-Guided Few-Shot Selection for Toxic Language Detection
Tommaso Caselli | Flor Miriam Plaza-del-Arco
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)

In-context learning (ICL) has shown significant benefits, particularly in scenarios where large amounts of labeled data are unavailable. However, its effectiveness for highly subjective tasks, such as toxic language detection, remains an open question. A key challenge in this setting is to select shots to maximize performance. Although previous work has focused on enhancing variety and representativeness, the role of annotator disagreement in shot selection has received less attention. In this paper, we conduct an in-depth analysis of ICL using two families of open-source LLMs (Llama-3* and Qwen2.5) of varying sizes, evaluating their performance in five prominent English datasets covering multiple toxic language phenomena. We use disaggregated annotations and categorize different types of training examples to assess their impact on model predictions. We specifically investigate whether selecting shots based on annotators’ entropy – focusing on ambiguous or difficult examples – can improve generalization in LLMs. Additionally, we examine the extent to which the order of examples in prompts influences model behavior.Our results show that selecting shots based on entropy from annotator disagreement can enhance ICL performance. Specifically, ambiguous shots with a median entropy value generally lead to the best results for our selected LLMs in the few-shot setting. However, ICL often underperforms when compared to fine-tuned encoders.

Simulating Identity, Propagating Bias: Abstraction and Stereotypes in LLM-Generated Text
Pia Sommerauer | Giulia Rambelli | Tommaso Caselli
Findings of the Association for Computational Linguistics: EMNLP 2025

Persona-prompting is a growing strategy to steer LLMs toward simulating particular perspectives or linguistic styles through the lens of a specified identity. While this method is often used to personalize outputs, its impact on how LLMs represent social groups remains underexplored. In this paper, we investigate whether persona-prompting leads to different levels of linguistic abstraction—an established marker of stereotyping—when generating short texts linking socio-demographic categories with stereotypical or non-stereotypical attributes. Drawing on the Linguistic Expectancy Bias framework, we analyze outputs from six open-weight LLMs under three prompting conditions, comparing 11 persona-driven responses to those of a generic AI assistant. To support this analysis, we introduce Self-Stereo, a new dataset of self-reported stereotypes from Reddit. We measure abstraction through three metrics: concreteness, specificity, and negation. Our results highlight the limits of persona-prompting in modulating abstraction in language, confirming criticisms about the ecology of personas as representative of socio-demographic groups and raising concerns about the risk of propagating stereotypes even when seemingly evoking the voice of a marginalized groups.

2024

Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts
Arianna Muti | Federico Ruggeri | Khalid Al Khatib | Alberto Barrón-Cedeño | Tommaso Caselli
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

We propose misogyny detection as an Argumentative Reasoning task and we investigate the capacity of large language models (LLMs) to understand the implicit reasoning used to convey misogyny in both Italian and English. The central aim is to generate the missing reasoning link between a message and the implied meanings encoding the misogyny. Our study uses argumentation theory as a foundation to form a collection of prompts in both zero-shot and few-shot settings. These prompts integrate different techniques, including chain-of-thought reasoning and augmented knowledge. Our findings show that LLMs fall short on reasoning capabilities about misogynistic comments and that they mostly rely on their implicit knowledge derived from internalized common stereotypes about women to generate implied assumptions, rather than on inductive reasoning.

VeryfIT - Benchmark of Fact-Checked Claims for Italian: A CALAMITA Challenge
Jacopo Gili | Viviana Patti | Lucia Passaro | Tommaso Caselli
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Achieving factual accuracy is a known pending issue for language models. Their design centered around the interactive component of user interaction and the extensive use of “spontaneous” training data, has made them highly adept at conversational tasks but not fully reliable in terms of factual correctness. VeryfIT addresses this issue by evaluating the in-memory factual knowledge of language models on data written by professional fact-checkers, posing it as a true or false question.Topics of the statements vary but most are in specific domains related to the Italian government, policies, and social issues. The task presents several challenges: extracting statements from segments of speeches, determining appropriate contextual relevance both temporally and factually, and ultimately verifying the accuracy of the statements.

EurekaRebus - Verbalized Rebus Solving with LLMs: A CALAMITA Challenge
Gabriele Sarti | Tommaso Caselli | Arianna Bisazza | Malvina Nissim
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Language games can be valuable resources for testing the ability of large language models (LLMs) to conduct challenging multi-step, knowledge-intensive inferences while respecting predefined constraints. Our proposed challenge prompts LLMs to reason step-by-step to solve verbalized variants of rebus games recently introduced with the EurekaRebus dataset. Verbalized rebuses replace visual cues with crossword definitions to create an encrypted first pass, making the problem entirely text-based. We introduce a simplified task variant with word length hints and adopt a comprehensive set of metrics to obtain a granular overview of models’ performance in knowledge recall, constraints adherence, and re-segmentation abilities across reasoning steps.

Non Verbis, Sed Rebus: Large Language Models Are Weak Solvers of Italian Rebuses
Gabriele Sarti | Tommaso Caselli | Malvina Nissim | Arianna Bisazza
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Rebuses are puzzles requiring constrained multi-step reasoning to identify a hidden phrase from a set of images and letters. In this work, we introduce a large collection of verbalized rebuses for the Italian language and use it to assess the rebus-solving capabilities of state-of-the-art large language models. While general-purpose systems such as LLaMA-3 and GPT-4o perform poorly on this task, ad-hoc fine-tuning seems to improve models’ performance. However, we find that performance gains from training are largely motivated by memorization. Our results suggest that rebus solving remains a challenging test bed to evaluate large language models’ linguistic proficiency and sequential instruction-following skills.

Assessing the Asymmetric Behaviour of Italian Large Language Models across Different Syntactic Structures
Elena Sofia Ruzzetti | Federico Ranaldi | Dario Onorati | Davide Venditti | Leonardo Ranaldi | Tommaso Caselli | Fabio Massimo Zanzotto
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

While LLMs get more proficient at solving tasks and generating sentences, we aim to investigate the role that differentsyntactic structures have on models’ performances on a battery of Natural Language Understanding tasks. We analyze theperformance of five LLMs on semantically equivalent sentences that are characterized by different syntactic structures. Tocorrectly solve the tasks, a model is implicitly required to correctly parse the sentence. We found out that LLMs strugglewhen there are more complex syntactic structures, with an average drop of 16.13(±11.14) points in accuracy on Q&A task.Additionally, we propose a method based on token attribution to spot which area of the LLMs encode syntactic knowledge,by identifying model heads and layers responsible for the generation of a correct answer

Proceedings of the First Workshop on Reference, Framing, and Perspective @ LREC-COLING 2024
Pia Sommerauer | Tommaso Caselli | Malvina Nissim | Levi Remijnse | Piek Vossen
Proceedings of the First Workshop on Reference, Framing, and Perspective @ LREC-COLING 2024

2023

SKAM at SemEval-2023 Task 10: Linguistic Feature Integration and Continuous Pretraining for Online Sexism Detection and Classification
Murali Manohar Kondragunta | Amber Chen | Karlo Slot | Sanne Weering | Tommaso Caselli
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Sexism has been prevalent online. In this paper, we explored the effect of explicit linguistic features and continuous pretraining on the performance of pretrained language models in sexism detection. While adding linguistic features did not improve the performance of the model, continuous pretraining did slightly boost the performance of the model in Task B from a mean macro-F1 score of 0.6156 to 0.6246. The best mean macro-F1 score in Task A was achieved by a finetuned HateBERT model using regular pretraining (0.8331). We observed that the linguistic features did not improve the model’s performance. At the same time, continuous pretraining proved beneficial only for nuanced downstream tasks like Task-B.

Benchmarking Offensive and Abusive Language in Dutch Tweets
Tommaso Caselli | Hylke Van Der Veen
The 7th Workshop on Online Abuse and Harms (WOAH)

We present an extensive evaluation of different fine-tuned models to detect instances of offensive and abusive language in Dutch across three benchmarks: a standard held-out test, a task- agnostic functional benchmark, and a dynamic test set. We also investigate the use of data cartography to identify high quality training data. Our results show a relatively good quality of the manually annotated data used to train the models while highlighting some critical weakness. We have also found a good portability of trained models along the same language phenomena. As for the data cartography, we have found a positive impact only on the functional benchmark and when selecting data per annotated dimension rather than using the entire training material.

Check-IT!: A Corpus of Expert Fact-checked Claims for Italian
Jacopo Gili | Lucia Passaro | Tommaso Caselli
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

RECESS: Resource for Extracting Cause, Effect, and Signal Spans
Fiona Anting Tan | Hansi Hettiarachchi | Ali Hürriyetoğlu | Nelleke Oostdijk | Tommaso Caselli | Tadashi Nomoto | Onur Uca | Farhana Ferdousi Liza | See-Kiong Ng
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Dynamic Stance: Modeling Discussions by Labeling the Interactions
Blanca Figueras | Irene Baucells | Tommaso Caselli
Findings of the Association for Computational Linguistics: EMNLP 2023

Stance detection is an increasingly popular task that has been mainly modeled as a static task, by assigning the expressed attitude of a text toward a given topic. Such a framing presents limitations, with trained systems showing poor generalization capabilities and being strongly topic-dependent. In this work, we propose modeling stance as a dynamic task, by focusing on the interactions between a message and their replies. For this purpose, we present a new annotation scheme that enables the categorization of all kinds of textual interactions. As a result, we have created a new corpus, the Dynamic Stance Corpus (DySC), consisting of three datasets in two middle-resourced languages: Catalan and Dutch. Our data analysis further supports our modeling decisions, empirically showing differences between the annotation of stance in static and dynamic contexts. We fine-tuned a series of monolingual and multilingual models on DySC, showing portability across topics and languages.

WikiBio: a Semantic Resource for the Intersectional Analysis of Biographical Events
Marco Antonio Stranisci | Rossana Damiano | Enrico Mensa | Viviana Patti | Daniele Radicioni | Tommaso Caselli
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Biographical event detection is a relevant task that allows for the exploration and comparison of the ways in which people’s lives are told and represented. This may support several real-life applications in digital humanities and in works aimed at exploring bias about minoritized groups. Despite that, there are no corpora and models specifically designed for this task. In this paper we fill this gap by presenting a new corpus annotated for biographical event detection. The corpus, which includes 20 Wikipedia biographies, was aligned with 5 existing corpora in order to train a model for the biographical event detection task. The model was able to detect all mentions of the target-entity in a biography with an F-score of 0.808 and the entity-related events with an F-score of 0.859. Finally, the model was used for performing an analysis of biases about women and non-Western people in Wikipedia biographies.

2022

How about Time? Probing a Multilingual Language Model for Temporal Relations
Tommaso Caselli | Irene Dini | Felice Dell’Orletta
Proceedings of the 29th International Conference on Computational Linguistics

This paper presents a comprehensive set of probing experiments using a multilingual language model, XLM-R, for temporal relation classification between events in four languages. Results show an advantage of contextualized embeddings over static ones and a detrimen- tal role of sentence level embeddings. While obtaining competitive results against state-of-the-art systems, our probes indicate a lack of suitable encoded information to properly address this task.

The Causal News Corpus: Annotating Causal Relations in Event Sentences from News
Fiona Anting Tan | Ali Hürriyetoğlu | Tommaso Caselli | Nelleke Oostdijk | Tadashi Nomoto | Hansi Hettiarachchi | Iqra Ameer | Onur Uca | Farhana Ferdousi Liza | Tiancheng Hu
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Therefore, we propose an annotation schema for event causality that addresses these concerns. We annotated 3,559 event sentences from protest event news with labels on whether it contains causal relations or not. Our corpus is known as the Causal News Corpus (CNC). A neural network built upon a state-of-the-art pre-trained language model performed well with 81.20% F1 score on test set, and 83.46% in 5-folds cross-validation. CNC is transferable across two external corpora: CausalTimeBank (CTB) and Penn Discourse Treebank (PDTB). Leveraging each of these external datasets for training, we achieved up to approximately 64% F1 on the CNC test set without additional fine-tuning. CNC also served as an effective training and pre-training dataset for the two external corpora. Lastly, we demonstrate the difficulty of our task to the layman in a crowd-sourced annotation exercise. Our annotated corpus is publicly available, providing a valuable resource for causal text mining researchers.

Share and Shout: Proto-Slogans in Online Political Communities
Irene Russo | Gloria Comandini | Tommaso Caselli | Viviana Patti
Journal for Language Technology and Computational Linguistics, Vol. 35 No. 2

RUG-1-Pegasussers at SemEval-2022 Task 3: Data Generation Methods to Improve Recognizing Appropriate Taxonomic Word Relations
Frank van den Berg | Gijs Danoe | Esther Ploeger | Wessel Poelman | Lukas Edman | Tommaso Caselli
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes our system created for the SemEval 2022 Task 3: Presupposed Taxonomies - Evaluating Neural-network Semantics. This task is focused on correctly recognizing taxonomic word relations in English, French and Italian. We developed various datageneration techniques that expand the originally provided train set and show that all methods increase the performance of modelstrained on these expanded datasets. Our final system outperformed the baseline system from the task organizers by achieving an average macro F1 score of 79.6 on all languages, compared to the baseline’s 67.4.

“Zo Grof !”: A Comprehensive Corpus for Offensive and Abusive Language in Dutch
Ward Ruitenbeek | Victor Zwart | Robin Van Der Noord | Zhenja Gnezdilov | Tommaso Caselli
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

This paper presents a comprehensive corpus for the study of socially unacceptable language in Dutch. The corpus extends and revise an existing resource with more data and introduces a new annotation dimension for offensive language, making it a unique resource in the Dutch language panorama. Each language phenomenon (abusive and offensive language) in the corpus has been annotated with a multi-layer annotation scheme modelling the explicitness and the target(s) of the message. We have conducted a new set of experiments with different classification algorithms on all annotation dimensions. Monolingual Pre-Trained Language Models prove as the best systems, obtaining a macro-average F1 of 0.828 for binary classification of offensive language, and 0.579 for the targets of offensive messages. Furthermore, the best system obtains a macro-average F1 of 0.667 for distinguishing between abusive and offensive messages.

SocioFillmore: A Tool for Discovering Perspectives
Gosse Minnema | Sara Gemelli | Chiara Zanchi | Tommaso Caselli | Malvina Nissim
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

SOCIOFILLMORE is a multilingual tool which helps to bring to the fore the focus or the perspective that a text expresses in depicting an event. Our tool, whose rationale we also support through a large collection of human judgements, is theoretically grounded on frame semantics and cognitive linguistics, and implemented using the LOME frame semantic parser. We describe SOCIOFILLMORE’s development and functionalities, show how non-NLP researchers can easily interact with the tool, and present some example case studies which are already incorporated in the system, together with the kind of analysis that can be visualised.

Event Causality Identification with Causal News Corpus - Shared Task 3, CASE 2022
Fiona Anting Tan | Hansi Hettiarachchi | Ali Hürriyetoğlu | Tommaso Caselli | Onur Uca | Farhana Ferdousi Liza | Nelleke Oostdijk
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

The Event Causality Identification Shared Task of CASE 2022 involved two subtasks working on the Causal News Corpus. Subtask 1 required participants to predict if a sentence contains a causal relation or not. This is a supervised binary classification task. Subtask 2 required participants to identify the Cause, Effect and Signal spans per causal sentence. This could be seen as a supervised sequence labeling task. For both subtasks, participants uploaded their predictions for a held-out test set, and ranking was done based on binary F1 and macro F1 scores for Subtask 1 and 2, respectively. This paper summarizes the work of the 17 teams that submitted their results to our competition and 12 system description papers that were received. The best F1 scores achieved for Subtask 1 and 2 were 86.19% and 54.15%, respectively. All the top-performing approaches involved pre-trained language models fine-tuned to the targeted task. We further discuss these approaches and analyze errors across participants’ systems in this paper.

Dead or Murdered? Predicting Responsibility Perception in Femicide News Reports
Gosse Minnema | Sara Gemelli | Chiara Zanchi | Tommaso Caselli | Malvina Nissim
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Different linguistic expressions can conceptualize the same event from different viewpoints by emphasizing certain participants over others. Here, we investigate a case where this has social consequences: how do linguistic expressions of gender-based violence (GBV) influence who we perceive as responsible? We build on previous psycholinguistic research in this area and conduct a large-scale perception survey of GBV descriptions automatically extracted from a corpus of Italian newspapers. We then train regression models that predict the salience of GBV participants with respect to different dimensions of perceived responsibility. Our best model (fine-tuned BERT) shows solid overall performance, with large differences between dimensions and participants: salient _focus_ is more predictable than salient _blame_, and perpetrators’ salience is more predictable than victims’ salience. Experiments with ridge regression models using different representations show that features based on linguistic theory similarly to word-based features. Overall, we show that different linguistic choices do trigger different perceptions of responsibility, and that such perceptions can be modelled automatically. This work can be a core instrument to raise awareness of the consequences of different perspectivizations in the general public and in news producers alike.

2021

MultiLexNorm: A Shared Task on Multilingual Lexical Normalization
Rob van der Goot | Alan Ramponi | Arkaitz Zubiaga | Barbara Plank | Benjamin Muller | Iñaki San Vicente Roncal | Nikola Ljubešić | Özlem Çetinoğlu | Rahmad Mahendra | Talha Çolakoğlu | Timothy Baldwin | Tommaso Caselli | Wladimir Sidorenko
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Lexical normalization is the task of transforming an utterance into its standardized form. This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation. Such variation is typical for social media on which information is shared in a multitude of ways, including diverse languages and code-switching. Since the seminal work of Han and Baldwin (2011) a decade ago, lexical normalization has attracted attention in English and multiple other languages. However, there exists a lack of a common benchmark for comparison of systems across languages with a homogeneous data and evaluation setup. The MultiLexNorm shared task sets out to fill this gap. We provide the largest publicly available multilingual lexical normalization benchmark including 13 language variants. We propose a homogenized evaluation setup with both intrinsic and extrinsic evaluation. As extrinsic evaluation, we use dependency parsing and part-of-speech tagging with adapted evaluation metrics (a-LAS, a-UAS, and a-POS) to account for alignment discrepancies. The shared task hosted at W-NUT 2021 attracted 9 participants and 18 submissions. The results show that neural normalization systems outperform the previous state-of-the-art system by a large margin. Downstream parsing and part-of-speech tagging performance is positively affected but to varying degrees, with improvements of up to 1.72 a-LAS, 0.85 a-UAS, and 1.54 a-POS for the winning system.

Frame Semantics for Social NLP in Italian: Analyzing Responsibility Framing in Femicide News Reports
Gosse Minnema | Sara Gemelli | Chiara Zanchi | Viviana Patti | Tommaso Caselli | Malvina Nissim
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

DALC: the Dutch Abusive Language Corpus
Tommaso Caselli | Arjan Schelhaas | Marieke Weultjes | Folkert Leistra | Hylke van der Veen | Gerben Timmerman | Malvina Nissim
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

As socially unacceptable language become pervasive in social media platforms, the need for automatic content moderation become more pressing. This contribution introduces the Dutch Abusive Language Corpus (DALC v1.0), a new dataset with tweets manually an- notated for abusive language. The resource ad- dress a gap in language resources for Dutch and adopts a multi-layer annotation scheme modeling the explicitness and the target of the abusive messages. Baselines experiments on all annotation layers have been conducted, achieving a macro F1 score of 0.748 for binary classification of the explicitness layer and .489 for target classification.

With the emergence of the COVID-19 pandemic, the political and the medical aspects of disinformation merged as the problem got elevated to a whole new level to become the first global infodemic. Fighting this infodemic has been declared one of the most important focus areas of the World Health Organization, with dangers ranging from promoting fake cures, rumors, and conspiracy theories to spreading xenophobia and panic. Addressing the issue requires solving a number of challenging problems such as identifying messages containing claims, determining their check-worthiness and factuality, and their potential to do harm as well as the nature of that harm, to mention just a few. To address this gap, we release a large dataset of 16K manually annotated tweets for fine-grained disinformation analysis that (i) focuses on COVID-19, (ii) combines the perspectives and the interests of journalists, fact-checkers, social media platforms, policy makers, and society, and (iii) covers Arabic, Bulgarian, Dutch, and English. Finally, we show strong evaluation results using pretrained Transformers, thus confirming the practical utility of the dataset in monolingual vs. multilingual, and single task vs. multitask settings.

A Multilingual Approach to Identify and Classify Exceptional Measures against COVID-19
Georgios Tziafas | Eugenie de Saint-Phalle | Wietse de Vries | Clara Egger | Tommaso Caselli
Proceedings of the Natural Legal Language Processing Workshop 2021

The COVID-19 pandemic has witnessed the implementations of exceptional measures by governments across the world to counteract its impact. This work presents the initial results of an on-going project, EXCEPTIUS, aiming to automatically identify, classify and com- pare exceptional measures against COVID-19 across 32 countries in Europe. To this goal, we created a corpus of legal documents with sentence-level annotations of eight different classes of exceptional measures that are im- plemented across these countries. We evalu- ated multiple multi-label classifiers on a manu- ally annotated corpus at sentence level. The XLM-RoBERTa model achieves highest per- formance on this multilingual multi-label clas- sification task, with a macro-average F1 score of 59.8%.

Fighting the COVID-19 Infodemic with a Holistic BERT Ensemble
Georgios Tziafas | Konstantinos Kogkalidis | Tommaso Caselli
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

This paper describes the TOKOFOU system, an ensemble model for misinformation detection tasks based on six different transformer-based pre-trained encoders, implemented in the context of the COVID-19 Infodemic Shared Task for English. We fine tune each model on each of the task’s questions and aggregate their prediction scores using a majority voting approach. TOKOFOU obtains an overall F1 score of 89.7%, ranking first.

Leveraging Bias in Pre-Trained Word Embeddings for Unsupervised Microaggression Detection
Tolúlope Ògúnremí | Nazanin Sabri | Valerio Basile | Tommaso Caselli
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

PROTEST-ER: Retraining BERT for Protest Event Extraction
Tommaso Caselli | Osman Mutlu | Angelo Basile | Ali Hürriyetoğlu
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

We analyze the effect of further retraining BERT with different domain specific data as an unsupervised domain adaptation strategy for event extraction. Portability of event extraction models is particularly challenging, with large performance drops affecting data on the same text genres (e.g., news). We present PROTEST-ER, a retrained BERT model for protest event extraction. PROTEST-ER outperforms a corresponding generic BERT on out-of-domain data of 8.1 points. Our best performing models reach 51.91-46.39 F1 across both domains.

The Corpora They Are a-Changing: a Case Study in Italian Newspapers
Pierpaolo Basile | Annalina Caputo | Tommaso Caselli | Pierluigi Cassotti | Rossella Varvara
Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021

The use of automatic methods for the study of lexical semantic change (LSC) has led to the creation of evaluation benchmarks. Benchmark datasets, however, are intimately tied to the corpus used for their creation questioning their reliability as well as the robustness of automatic methods. This contribution investigates these aspects showing the impact of unforeseen social and cultural dimensions. We also identify a set of additional issues (OCR quality, named entities) that impact the performance of the automatic methods, especially when used to discover LSC.

HateBERT: Retraining BERT for Abusive Language Detection in English
Tommaso Caselli | Valerio Basile | Jelena Mitrović | Michael Granitzer
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

We introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have curated and made available to the public. We present the results of a detailed comparison between a general pre-trained language model and the retrained version on three English datasets for offensive, abusive language and hate speech detection tasks. In all datasets, HateBERT outperforms the corresponding general BERT model. We also discuss a battery of experiments comparing the portability of the fine-tuned models across the datasets, suggesting that portability is affected by compatibility of the annotated phenomena.

Guiding Principles for Participatory Design-inspired Natural Language Processing
Tommaso Caselli | Roberto Cibin | Costanza Conforti | Enrique Encinas | Maurizio Teli
Proceedings of the 1st Workshop on NLP for Positive Impact

We introduce 9 guiding principles to integrate Participatory Design (PD) methods in the development of Natural Language Processing (NLP) systems. The adoption of PD methods by NLP will help to alleviate issues concerning the development of more democratic, fairer, less-biased technologies to process natural language data. This short paper is the outcome of an ongoing dialogue between designers and NLP experts and adopts a non-standard format following previous work by Traum (2000); Bender (2013); Abzianidze and Bos (2019). Every section is a guiding principle. While principles 1–3 illustrate assumptions and methods that inform community-based PD practices, we used two fictional design scenarios (Encinas and Blythe, 2018), which build on top of situations familiar to the authors, to elicit the identification of the other 6. Principles 4–6 describes the impact of PD methods on the design of NLP systems, targeting two critical aspects: data collection & annotation, and the deployment & evaluation. Finally, principles 7–9 guide a new reflexivity of the NLP research with respect to its context, actors and participants, and aims. We hope this guide will offer inspiration and a road-map to develop a new generation of PD-inspired NLP.

2020

GruPaTo at SemEval-2020 Task 12: Retraining mBERT on Social Media and Fine-tuned Offensive Language Models
Davide Colla | Tommaso Caselli | Valerio Basile | Jelena Mitrović | Michael Granitzer
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We introduce an approach to multilingual Offensive Language Detection based on the mBERT transformer model. We download extra training data from Twitter in English, Danish, and Turkish, and use it to re-train the model. We then fine-tuned the model on the provided training data and, in some configurations, implement transfer learning approach exploiting the typological relatedness between English and Danish. Our systems obtained good results across the three languages (.9036 for EN, .7619 for DA, and .7789 for TR).

Topic and Emotion Development among Dutch COVID-19 Twitter Communities in the early Pandemic
Boris Marinov | Jennifer Spenader | Tommaso Caselli
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media

The paper focuses on a large collection of Dutch tweets from the Netherlands to get an insight into the perception and reactions of users during the early months of the COVID-19 pandemic. We focused on five major user communities of users: government and health organizations, news media, politicians, the general public and conspiracy theory supporters, investigating differences among them in topic dominance and the expressions of emotions. Through topic modeling we monitor the evolution of the conversation about COVID-19 among these communities. Our results indicate that the national focus on COVID-19 shifted from the virus itself to its impact on the economy between February and April. Surprisingly, the overall emotional public response appears to be substantially positive and expressing trust, although differences can be observed in specific group of users.

I Feel Offended, Don’t Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language
Tommaso Caselli | Valerio Basile | Jelena Mitrović | Inga Kartoziya | Michael Granitzer
Proceedings of the Twelfth Language Resources and Evaluation Conference

Abusive language detection is an unsolved and challenging problem for the NLP community. Recent literature suggests various approaches to distinguish between different language phenomena (e.g., hate speech vs. cyberbullying vs. offensive language) and factors (degree of explicitness and target) that may help to classify different abusive language phenomena. There are data sets that annotate the target of abusive messages (i.e.OLID/OffensEval (Zampieri et al., 2019a)). However, there is a lack of data sets that take into account the degree of explicitness. In this paper, we propose annotation guidelines to distinguish between explicit and implicit abuse in English and apply them to OLID/OffensEval. The outcome is a newly created resource, AbuseEval v1.0, which aims to address some of the existing issues in the annotation of offensive and abusive language (e.g., explicitness of the message, presence of a target, need of context, and interaction across different phenomena).

Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events
Claire Bonial | Tommaso Caselli | Snigdha Chaturvedi | Elizabeth Clark | Ruihong Huang | Mohit Iyyer | Alejandro Jaimes | Heng Ji | Lara J. Martin | Ben Miller | Teruko Mitamura | Nanyun Peng | Joel Tetreault
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events

Norm It! Lexical Normalization for Italian and Its Downstream Effects for Dependency Parsing
Rob van der Goot | Alan Ramponi | Tommaso Caselli | Michele Cafagna | Lorenzo De Mattei
Proceedings of the Twelfth Language Resources and Evaluation Conference

Lexical normalization is the task of translating non-standard social media data to a standard form. Previous work has shown that this is beneficial for many downstream tasks in multiple languages. However, for Italian, there is no benchmark available for lexical normalization, despite the presence of many benchmarks for other tasks involving social media data. In this paper, we discuss the creation of a lexical normalization dataset for Italian. After two rounds of annotation, a Cohen’s kappa score of 78.64 is obtained. During this process, we also analyze the inter-annotator agreement for this task, which is only rarely done on datasets for lexical normalization,and when it is reported, the analysis usually remains shallow. Furthermore, we utilize this dataset to train a lexical normalization model and show that it can be used to improve dependency parsing of social media data. All annotated data and the code to reproduce the results are available at: http://bitbucket.org/robvanderg/normit.

Lower Bias, Higher Density Abusive Language Datasets: A Recipe
Juliet van Rosendaal | Tommaso Caselli | Malvina Nissim
Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language

Datasets to train models for abusive language detection are at the same time necessary and still scarce. One the reasons for their limited availability is the cost of their creation. It is not only that manual annotation is expensive, it is also the case that the phenomenon is sparse, causing human annotators having to go through a large number of irrelevant examples in order to obtain some significant data. Strategies used until now to increase density of abusive language and obtain more meaningful data overall, include data filtering on the basis of pre-selected keywords and hate-rich sources of data. We suggest a recipe that at the same time can provide meaningful data with possibly higher density of abusive language and also reduce top-down biases imposed by corpus creators in the selection of the data to annotate. More specifically, we exploit the controversy channel on Reddit to obtain keywords that are used to filter a Twitter dataset. While the method needs further validation and refinement, our preliminary experiments show a higher density of abusive tweets in the filtered vs unfiltered dataset, and a more meaningful topic distribution after filtering.

A Diachronic Italian Corpus based on “L’Unità”
Pierpaolo Basile | Annalina Caputo | Tommaso Caselli | Pierluigi Cassotti | Rossella Varvara
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

2019

There and Back Again: Cross-Lingual Transfer Learning for Event Detection
Tommaso Caselli | Ahmet Üstün
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

2018

Systems’ Agreements and Disagreements in Temporal Processing: An Extensive Error Analysis of the TempEval-3 Task
Tommaso Caselli | Roser Morante
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Italian Event Detection Goes Deep Learning
Tommaso Caselli
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

Source-driven Representations for Hate Speech Detection
Flavio Merenda | Claudia Zaghi | Tommaso Caselli | Malvina Nissim
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

The Circumstantial Event Ontology (CEO) and ECB+/CEO: an Ontology and Corpus for Implicit Causal Relations between Events
Roxane Segers | Tommaso Caselli | Piek Vossen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Proceedings of the Workshop Events and Stories in the News 2018
Tommaso Caselli | Ben Miller | Marieke van Erp | Piek Vossen | Martha Palmer | Eduard Hovy | Teruko Mitamura | David Caswell | Susan W. Brown | Claire Bonial
Proceedings of the Workshop Events and Stories in the News 2018

Crowdsourcing StoryLines: Harnessing the Crowd for Causal Relation Annotation
Tommaso Caselli | Oana Inel
Proceedings of the Workshop Events and Stories in the News 2018

This paper describes a crowdsourcing experiment on the annotation of plot-like structures in English news articles. CrowdThruth methodology and metrics have been applied to select valid annotations from the crowd. We further run an in-depth analysis of the annotated data by comparing them with available expert data. Our results show a valuable use of crowdsourcing annotations for such complex semantic tasks, and suggest a new annotation approach which combine crowd and experts.

2017

The Content Types Dataset: a New Resource to Explore Semantic and Functional Characteristics of Texts
Rachele Sprugnoli | Tommaso Caselli | Sara Tonelli | Giovanni Moretti
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper presents a new resource, called Content Types Dataset, to promote the analysis of texts as a composition of units with specific semantic and functional roles. By developing this dataset, we also introduce a new NLP task for the automatic classification of Content Types. The annotation scheme and the dataset are described together with two sets of classification experiments.

Proceedings of the Events and Stories in the News Workshop
Tommaso Caselli | Ben Miller | Marieke van Erp | Piek Vossen | Martha Palmer | Eduard Hovy | Teruko Mitamura | David Caswell
Proceedings of the Events and Stories in the News Workshop

The Circumstantial Event Ontology (CEO)
Roxane Segers | Tommaso Caselli | Piek Vossen
Proceedings of the Events and Stories in the News Workshop

In this paper we describe the ongoing work on the Circumstantial Event Ontology (CEO), a newly developed ontology for calamity events that models semantic circumstantial relations between event classes. The circumstantial relations are designed manually, based on the shared properties of each event class. We discuss and contrast two types of event circumstantial relations: semantic circumstantial relations and episodic circumstantial relations. Further, we show the metamodel and the current contents of the ontology and outline the evaluation of the CEO.

Predicting Controversial News Using Facebook Reactions
Angelo Basile | Tommaso Caselli | Malvina Nissim
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction
Tommaso Caselli | Piek Vossen
Proceedings of the Events and Stories in the News Workshop

This paper reports on the Event StoryLine Corpus (ESC) v1.0, a new benchmark dataset for the temporal and causal relation detection. By developing this dataset, we also introduce a new task, the StoryLine Extraction from news data, which aims at extracting and classifying events relevant for stories, from across news documents spread in time and clustered around a single seminal event or topic. In addition to describing the dataset, we also report on three baselines systems whose results show the complexity of the task and suggest directions for the development of more robust systems.

2016

VUACLTL at SemEval 2016 Task 12: A CRF Pipeline to Clinical TempEval
Tommaso Caselli | Roser Morante
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

GRaSP: A Multilayered Annotation Scheme for Perspectives
Chantal van Son | Tommaso Caselli | Antske Fokkens | Isa Maks | Roser Morante | Lora Aroyo | Piek Vossen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a framework and methodology for the annotation of perspectives in text. In the last decade, different aspects of linguistic encoding of perspectives have been targeted as separated phenomena through different annotation initiatives. We propose an annotation scheme that integrates these different phenomena. We use a multilayered annotation approach, splitting the annotation of different aspects of perspectives into small subsequent subtasks in order to reduce the complexity of the task and to better monitor interactions between layers. Currently, we have included four layers of perspective annotation: events, attribution, factuality and opinion. The annotations are integrated in a formal model called GRaSP, which provides the means to represent instances (e.g. events, entities) and propositions in the (real or assumed) world in relation to their mentions in text. Then, the relation between the source and target of a perspective is characterized by means of perspective annotations. This enables us to place alternative perspectives on the same entity, event or proposition next to each other.

Crowdsourcing Salient Information from News and Tweets
Oana Inel | Tommaso Caselli | Lora Aroyo
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The increasing streams of information pose challenges to both humans and machines. On the one hand, humans need to identify relevant information and consume only the information that lies at their interests. On the other hand, machines need to understand the information that is published in online data streams and generate concise and meaningful overviews. We consider events as prime factors to query for information and generate meaningful context. The focus of this paper is to acquire empirical insights for identifying salience features in tweets and news about a target event, i.e., the event of “whaling”. We first derive a methodology to identify such features by building up a knowledge space of the event enriched with relevant phrases, sentiments and ranked by their novelty. We applied this methodology on tweets and we have performed preliminary work towards adapting it to news articles. Our results show that crowdsourcing text relevance, sentiments and novelty (1) can be a main step in identifying salient information, and (2) provides a deeper and more precise understanding of the data at hand compared to state-of-the-art approaches.

Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016)
Tommaso Caselli | Ben Miller | Marieke van Erp | Piek Vossen | David Caswell
Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016)

Temporal Information Annotation: Crowd vs. Experts
Tommaso Caselli | Rachele Sprugnoli | Oana Inel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes two sets of crowdsourcing experiments on temporal information annotation conducted on two languages, i.e., English and Italian. The first experiment, launched on the CrowdFlower platform, was aimed at classifying temporal relations given target entities. The second one, relying on the CrowdTruth metric, consisted in two subtasks: one devoted to the recognition of events and temporal expressions and one to the detection and classification of temporal relations. The outcomes of the experiments suggest a valuable use of crowdsourcing annotations also for a complex task like Temporal Processing.

The Storyline Annotation and Representation Scheme (StaR): A Proposal
Tommaso Caselli | Piek Vossen
Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016)

NLP and Public Engagement: The Case of the Italian School Reform
Tommaso Caselli | Giovanni Moretti | Rachele Sprugnoli | Sara Tonelli | Damien Lanfrey | Donatella Solda Kutzmann
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present PIERINO (PIattaforma per l’Estrazione e il Recupero di INformazione Online), a system that was implemented in collaboration with the Italian Ministry of Education, University and Research to analyse the citizens’ comments given in #labuonascuola survey. The platform includes various levels of automatic analysis such as key-concept extraction and word co-occurrences. Each analysis is displayed through an intuitive view using different types of visualizations, for example radar charts and sunburst. PIERINO was effectively used to support shaping the last Italian school reform, proving the potential of NLP in the context of policy making.

Unshared Task at the 3rd Workshop on Argument Mining: Perspective Based Local Agreement and Disagreement in Online Debate
Chantal van Son | Tommaso Caselli | Antske Fokkens | Isa Maks | Roser Morante | Lora Aroyo | Piek Vossen
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

2015

Storylines for structuring massive streams of news
Piek Vossen | Tommaso Caselli | Yiota Kontzopoulou
Proceedings of the First Workshop on Computing News Storylines

SemEval-2015 Task 9: CLIPEval Implicit Polarity of Events
Irene Russo | Tommaso Caselli | Carlo Strapparava
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

SPINOZA_VU: An NLP Pipeline for Cross Document TimeLines
Tommaso Caselli | Antske Fokkens | Roser Morante | Piek Vossen
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

Proceedings of the First Workshop on Computing News Storylines
Tommaso Caselli | Marieke van Erp | Anne-Lyse Minard | Mark Finlayson | Ben Miller | Jordi Atserias | Alexandra Balahur | Piek Vossen
Proceedings of the First Workshop on Computing News Storylines

2014

Enriching the “Senso Comune” Platform with Automatically Acquired Data
Tommaso Caselli | Laure Vieu | Carlo Strapparava | Guido Vetere
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper reports on research activities on automatic methods for the enrichment of the Senso Comune platform. At this stage of development, we will report on two tasks, namely word sense alignment with MultiWordNet and automatic acquisition of Verb Shallow Frames from sense annotated data in the MultiSemCor corpus. The results obtained are satisfying. We achieved a final F-measure of 0.64 for noun sense alignment and a F-measure of 0.47 for verb sense alignment, and an accuracy of 68% on the acquisition of Verb Shallow Frames.

FBK-TR: Applying SVM with Multiple Linguistic Features for Cross-Level Semantic Similarity
Ngoc Phuoc An Vo | Tommaso Caselli | Octavian Popescu
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

FBK-TR: SVM for Semantic Relatedeness and Corpus Patterns for RTE
Ngoc Phuoc An Vo | Octavian Popescu | Tommaso Caselli
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

Automatic Domain Assignment for Word Sense Alignment
Tommaso Caselli | Carlo Strapparava
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Aligning an Italian WordNet with a Lexicographic Dictionary: Coping with limited data
Tommaso Caselli | Carlo Strapparava | Laure Vieu | Guido Vetere
Proceedings of the Seventh Global Wordnet Conference

2013

Aligning Verb Senses in Two Italian Lexical Semantic Resources
Tommaso Caselli | Carlo Strapparava | Laure Vieu | Guido Vetere
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

From Glosses to Qualia: Qualia Extraction from Senso Comune
Tommaso Caselli | Irene Russo
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)

2012

Assigning Connotation Values to Events
Tommaso Caselli | Irene Russo | Francesco Rubino
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Sentiment Analysis (SA) and Opinion Mining (OM) have become a popular task in recent years in NLP with the development of language resources, corpora and annotation schemes. The possibility to discriminate between objective and subjective expressions contributes to the identification of a document's semantic orientation and to the detection of the opinions and sentiments expressed by the authors or attributed to other participants in the document. Subjectivity word sense disambiguation helps in this task, automatically determining which word senses in a corpus are being used subjectively and which are being used objectively. This paper reports on a methodology to assign in a semi-automatic way connotative values to eventive nouns usually labelled as neutral through syntagmatic patterns that express cause-effect relations between emotion cause events and emotion words. We have applied our method to nouns and we have been able reduce the number of OBJ polarity values associated to event noun.

Sourcing the Crowd for a Few Good Ones: Event Type Detection
Tommaso Caselli | Chu-Ren Huang
Proceedings of COLING 2012: Posters

Customizable SCF Acquisition in Italian
Tommaso Caselli | Francesco Rubino | Francesca Frontini | Irene Russo | Valeria Quochi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy.

2011

Annotating Events, Temporal Expressions and Relations in Italian: the It-Timeml Experience for the Ita-TimeBank
Tommaso Caselli | Valentina Bartalesi Lenzi | Rachele Sprugnoli | Emanuele Pianta | Irina Prodanof
Proceedings of the 5th Linguistic Annotation Workshop

EMOCause: An Easy-adaptable Approach to Extract Emotion Cause Contexts
Irene Russo | Tommaso Caselli | Francesco Rubino | Ester Boldrini | Patricio Martínez-Barco
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

Data-Driven Approach Using Semantics for Recognizing and Classifying TimeML Events in Italian
Tommaso Caselli | Hector Llorens | Borja Navarro-Colorado | Estela Saquete
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

SemEval-2010 Task 13: TempEval-2
Marc Verhagen | Roser Saurí | Tommaso Caselli | James Pustejovsky
Proceedings of the 5th International Workshop on Semantic Evaluation

Annotating Event Anaphora: A Case Study
Tommaso Caselli | Irina Prodanof
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In recent years we have resgitered a renewed interest in event detection and temporal processing of text/discourse. TimeML (Pustejovsky et al., 2003a) has shed new lights on the notion of event and developed a new methodology for its annotation. On a parallel, works on anaphora resolution have developed a reliable methodology for the annotation and pointed out the core role of this phenomenon for the improvement of NLP systems. This paper tries to put together these two lines of research by describing a case study for the creation of an annotation scheme on event anaphora. We claim that this work could have consequences for the annotation of eventualities as proposed in TimeML and on the use of the tag and on the study of anaphora and its annotation. The annotation scheme and its guidelines have been developed on the basis of a coarse grained bottom up approach. In order to do this, we have performed a small sampling annotation which has highlighted shortcomings and open issues which need to be resolved.

2008

UFRA: a UIMA-based Approach to Federated Language Resource Architecture
Riccardo Del Gratta | Roberto Bartolini | Tommaso Caselli | Monica Monachini | Claudia Soria | Nicoletta Calzolari
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we address the issue of developing an interoperable infrastructure for language resources and technologies. In our approach, called UFRA, we extend the Federate Database Architecture System adding typical functionalities caming from UIMA. In this way, we capitalize the advantages of a federated architecture, such as autonomy, heterogeneity and distribution of components, monitored by a central authority responsible for checking both the integration of components and user rights on performing different tasks. We use the UIMA approach to manage and define one common front-end, enabling users and clients to query, retrieve and use language resources and technologies. The purpose of this paper is to show how UIMA leads from a Federated Database Architecture to a Federated Resource Architecture, adding to a registry of available components both static resources such as lexicons and corpora and dynamic ones such as tools and general purpose language technologies. At the end of the paper, we present a case-study that adopts this framework to integrate the SIMPLE lexicon and TIMEML annotation guidelines to tag natural language texts.

A Bilingual Corpus of Inter-linked Events
Tommaso Caselli | Nancy Ide | Roberto Bartolini
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes the creation of a bilingual corpus of inter-linked events for Italian and English. Linkage is accomplished through the Inter-Lingual Index (ILI) that links ItalWordNet with WordNet. The availability of this resource, on the one hand, enables contrastive analysis of the linguistic phenomena surrounding events in both languages, and on the other hand, can be used to perform multilingual temporal analysis of texts. In addition to describing the methodology for construction of the inter-linked corpus and the analysis of the data collected, we demonstrate that the ILI could potentially be used to bootstrap the creation of comparable corpora by exporting layers of annotation for words that have the same sense.

2007

Inferring the Semantics of Temporal Prepositions in Italian
Tommaso Caselli | Valeria Quochi
Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions

2006

Annotating Bridging Anaphors in Italian: in Search of Reliability
Tommaso Caselli | Irina Prodanof
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The aim of this work is the presentation and preliminary evaluation of an XML annotation scheme for marking bridging anaphors of the form definite article + N in Italian. The scheme is based on a corpus-study. The data we collected from the evaluation experiment seem to support the reliability of the scheme, although some problems still remain open.

Co-authors

Viviana Patti 5

Carlo Strapparava 5

Valerio Basile 4

Ali Hürriyetoğlu 4

Rachele Sprugnoli 4

Marieke van Erp 4

David Caswell 3

Antske Fokkens 3

Michael Granitzer 3

Hansi Hettiarachchi 3

Farhana Ferdousi Liza 3

Gosse Minnema 3

Teruko Mitamura 3

Jelena Mitrović 3

Nelleke Oostdijk 3

Irina Prodanof 3

Francesco Rubino 3

Fiona Anting Tan 3

Chiara Zanchi 3

Alberto Barrón-Cedeño 2

Roberto Bartolini 2

Angelo Basile 2

Pierpaolo Basile 2

Arianna Bisazza 2

Claire Bonial 2

Annalina Caputo 2

Pierluigi Cassotti 2

Felice Dell’Orletta 2

Rob Van Der Goot 2

Giovanni Moretti 2

Tadashi Nomoto 2

Martha Palmer 2

Lucia Passaro 2

Octavian Popescu 2

Valeria Quochi 2

Elena Sofia Ruzzetti 2

Gabriele Sarti 2

Roxane Segers 2

Pia Sommerauer 2

Georgios Tziafas 2

Hylke Van Der Veen 2

Rossella Varvara 2

Ngoc Phuoc An Vo 2

Fabio Massimo Zanzotto 2

Chantal van Son 2

Ahmed Abdelali 1

Abdulaziz Al-Homaid 1

Jordi Atserias 1

Alexandra Balahur 1

Timothy Baldwin 1

Valentina Bartalesi Lenzi 1

Irene Baucells 1

Ester Boldrini 1

Susan Windisch Brown 1

Dominique Brunato 1

Britt Bruntink 1

Michele Cafagna 1

Nicoletta Calzolari 1

Snigdha Chaturvedi 1

Roberto Cibin 1

Alessandra Teresa Cignarella 1

Elizabeth Clark 1

Gloria Comandini 1

Costanza Conforti 1

Giovanni Da San Martino 1

Rossana Damiano 1

Kareem Darwish 1

Riccardo Del Gratta 1

Nadir Durrani 1

Enrique Encinas 1

Blanca Figueras 1

Mark Finlayson 1

Francesca Frontini 1

Zhenja Gnezdilov 1

Ruihong Huang 1

Chu-Ren Huang 1

Alejandro Jaimes 1

Inga Kartoziya 1

Khalid Al Khatib 1

Konstantinos Kogkalidis 1

Murali Manohar Kondragunta 1

Yiota Kontzopoulou 1

Donatella Solda Kutzmann 1

Damien Lanfrey 1

Folkert Leistra 1

Nikola Ljubešić 1

Héctor Llorens 1

Rahmad Mahendra 1

Boris Marinov 1

Lara J. Martin 1

Patricio Martínez-Barco 1

Lorenzo De Mattei 1

Flavio Merenda 1

Anne-Lyse Minard 1

Monica Monachini 1

Hamdy Mubarak 1

Benjamin Muller 1

Preslav Nakov 1

Borja Navarro 1

Dario Onorati 1

Emanuele Pianta 1

Barbara Plank 1

Flor Miriam Plaza-del-Arco 1

Esther Ploeger 1

Wessel Poelman 1

James Pustejovsky 1

Daniele P. Radicioni 1

Giulia Rambelli 1

Federico Ranaldi 1

Leonardo Ranaldi 1

Levi Remijnse 1

Federico Ruggeri 1

Ward Ruitenbeek 1

Nazanin Sabri 1

Hassan Sajjad 1

Iñaki San Vicente Roncal 1

Estela Saquete 1

Arjan Schelhaas 1

Wladimir Sidorenko 1

Claudia Soria 1

Jennifer Spenader 1

Marco Antonio Stranisci 1

Maurizio Teli 1

Joel Tetreault 1

Gerben Timmerman 1

Frank Van Den Berg 1

Robin Van Der Noord 1

Davide Venditti 1

Marc Verhagen 1

Sanne Weering 1

Marieke Weultjes 1

Claudia Zaghi 1

Wajdi Zaghouani 1

Arkaitz Zubiaga 1

Eugenie de Saint-Phalle 1

Wietse de Vries 1

Juliet van Rosendaal 1

Özlem Çetinoğlu 1

Talha Çolakoğlu 1

Tolúlọpẹ́ Ògúnrẹ̀mí 1

Ahmet Üstün 1

Venues