Dominique Brunato - ACL Anthology

Dominique Brunato

2025

The Role of Eye-Tracking Data in Encoder-Based Models: An In-depth Linguistic Analysis
Lucia Domenichelli | Luca Dini | Dominique Brunato | Felice Dell’Orletta
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

TEXT-CAKE: Challenging Language Models on Local Text Coherence
Luca Dini | Dominique Brunato | Felice Dell’Orletta | Tommaso Caselli
Proceedings of the 31st International Conference on Computational Linguistics

We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller.

Learning from Impairment: Leveraging Insights from Clinical Linguistics in Language Modelling Research
Dominique Brunato
Proceedings of the 31st International Conference on Computational Linguistics

This position paper investigates the potential of integrating insights from language impairment research and its clinical treatment to develop human-inspired learning strategies and evaluation frameworks for language models (LMs). We inspect the theoretical underpinnings underlying some influential linguistically motivated training approaches derived from neurolinguistics and, particularly, aphasiology, aimed at enhancing the recovery and generalization of linguistic skills in aphasia treatment, with a primary focus on those targeting the syntactic domain. We highlight how these insights can inform the design of rigorous assessments for LMs, specifically in their handling of complex syntactic phenomena, as well as their implications for developing human-like learning strategies, aligning with efforts to create more sustainable and cognitively plausible natural language processing (NLP) models.

Direct and Indirect Interpretations of Speech Acts: Evidence from Human Judgments and Large Language Models
Massimiliano Orsini | Dominique Brunato
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

Exploring LLM-Based Assessment of Italian Middle School Writing: A Pilot Study
Adriana Mirabella | Dominique Brunato
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

This study investigates the use of ChatGPT for Automated Essay Scoring (AES) in assessing Italian middle school students’ written texts. Using rubrics targeting grammar, coherence and argumentation, we compare AI-generated feedback with that of a human teacher on a newly collected corpus of students’ essays. Despite some differences, ChatGPT provided detailed and timely feedback that complements the teacher’s role. These findings underscore the potential of generative AI to improve the assessment of writing, providing useful insights for educators and supporting students in developing their writing skills.

A Novel Real-World Dataset of Italian Clinical Notes for NLP-based Decision Support in Low Back Pain Treatment
Agnese Bonfigli | Ruben Piperno | Luca Bacco | Felice Dell’Orletta | Dominique Brunato | Filippo Crispino | Giuseppe Francesco Papalia | Fabrizio Russo | Gianluca Vadalà | Rocco Papalia | Mario Merone | Leandro Pecchia
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

From Human Reading to NLM Understanding: Evaluating the Role of Eye-Tracking Data in Encoder-Based Models
Luca Dini | Lucia Domenichelli | Dominique Brunato | Felice Dell’Orletta
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cognitive signals, particularly eye-tracking data, offer valuable insights into human language processing. Leveraging eye-gaze data from the Ghent Eye-Tracking Corpus, we conducted a series of experiments to examine how integrating knowledge of human reading behavior impacts Neural Language Models (NLMs) across multiple dimensions: task performance, attention mechanisms, and the geometry of their embedding space. We explored several fine-tuning methodologies to inject eye-tracking features into the models. Our results reveal that incorporating these features does not degrade downstream task performance, enhances alignment between model attention and human attention patterns, and compresses the geometry of the embedding space.

2024

TRACE-it: Testing Relative clAuses Comprehension through Entailment in ITalian: A CALAMITA Challenge
Dominique Brunato
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Introduced in the context of CALAMITA 2024, TRACE-it (Testing Relative clAuses Comprehension through Entailment in ITalian) is a benchmark designed to evaluate the ability of Large Language Models (LLMs) to comprehend a specific type of complex syntactic construction in Italian: object relative clauses. In this report, we outline the theoretical framework that informed the creation of the dataset and provide a comprehensive overview of the linguistic materials used.

Hits or Misses? A Linguistically Explainable Formula for Fanfiction Success
Giulio Leonardi | Dominique Brunato | Felice Dell’orletta
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

This study presents a computational analysis of Italian fanfiction, aiming to construct an interpretable model of successful writing within this emerging literary domain. Leveraging explicit features that capture both linguistic style and semantic content, we demonstrate the feasibility of automatically predicting successful writing in fanfiction and we identify a set of robust linguistic predictors that maintain their predictive power across diverse topics and time periods, offering insights into the universal aspects of engaging storytelling. This approach not only enhances our understanding of fanfiction as a genre but also offers potential applications in broader literary analysis and content creation.

2023

Coherent or Not? Stressing a Neural Language Model for Discourse Coherence in Multiple Languages
Dominique Brunato | Felice Dell’Orletta | Irene Dini | Andrea Amelio Ravelli
Findings of the Association for Computational Linguistics: ACL 2023

In this study, we investigate the capability of a Neural Language Model (NLM) to distinguish between coherent and incoherent text, where the latter has been artificially created to gradually undermine local coherence within text. While previous research on coherence assessment using NLMs has primarily focused on English, we extend our investigation to multiple languages. We employ a consistent evaluation framework to compare the performance of monolingual and multilingual models in both in-domain and out-domain settings. Additionally, we explore the model’s performance in a cross-language scenario.

Unraveling Text Coherence from the Human Perspective: a Novel Dataset for Italian
Federica Papa | Luca Dini | Dominique Brunato | Felice Dell’Orletta
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

2022

SemEval-2022 Task 3: PreTENS-Evaluating Neural Networks on Presuppositional Semantic Knowledge
Roberto Zamparelli | Shammur Chowdhury | Dominique Brunato | Cristiano Chesi | Felice Dell’Orletta | Md. Arid Hasan | Giulia Venturi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

We report the results of the SemEval 2022 Task 3, PreTENS, on evaluation the acceptability of simple sentences containing constructions whose two arguments are presupposed to be or not to be in an ordered taxonomic relation. The task featured two sub-tasks articulated as: (i) binary prediction task and (ii) regression task, predicting the acceptability in a continuous scale. The sentences were artificially generated in three languages (English, Italian and French). 21 systems, with 8 system papers were submitted for the task, all based on various types of fine-tuned transformer systems, often with ensemble methods and various data augmentation techniques. The best systems reached an F1-macro score of 94.49 (sub-task1) and a Spearman correlation coefficient of 0.80 (sub-task2), with interesting variations in specific constructions and/or languages.

2021

What Makes My Model Perplexed? A Linguistic Investigation on Neural Language Models Perplexity
Alessio Miaschi | Dominique Brunato | Felice Dell’Orletta | Giulia Venturi
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

This paper presents an investigation aimed at studying how the linguistic structure of a sentence affects the perplexity of two of the most popular Neural Language Models (NLMs), BERT and GPT-2. We first compare the sentence-level likelihood computed with BERT and the GPT-2’s perplexity showing that the two metrics are correlated. In addition, we exploit linguistic features capturing a wide set of morpho-syntactic and syntactic phenomena showing how they contribute to predict the perplexity of the two NLMs.

Probing Tasks Under Pressure
Alessio Miaschi | Chiara Alzetta | Dominique Brunato | Felice Dell’Orletta | Giulia Venturi
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

Quale testo è scritto meglio? A Study on Italian Native Speakers’ Perception of Writing Quality
Aldo Cerulli | Dominique Brunato | Felice Dell’Orletta
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

On the Role of Textual Connectives in Sentence Comprehension: A New Dataset for Italian
Giorgia Albertin | Alessio Miaschi | Dominique Brunato
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models
Gabriele Sarti | Dominique Brunato | Felice Dell’Orletta
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

This paper investigates the relationship between two complementary perspectives in the human assessment of sentence complexity and how they are modeled in a neural language model (NLM). The first perspective takes into account multiple online behavioral metrics obtained from eye-tracking recordings. The second one concerns the offline perception of complexity measured by explicit human judgments. Using a broad spectrum of linguistic features modeling lexical, morpho-syntactic, and syntactic properties of sentences, we perform a comprehensive analysis of linguistic phenomena associated with the two complexity viewpoints and report similarities and differences. We then show the effectiveness of linguistic features when explicitly leveraged by a regression model for predicting sentence complexity and compare its results with the ones obtained by a fine-tuned neural language model. We finally probe the NLM’s linguistic competence before and after fine-tuning, highlighting how linguistic information encoded in representations changes when the model learns to predict complexity.

Sentence Complexity in Context
Benedetta Iavarone | Dominique Brunato | Felice Dell’Orletta
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

We study the influence of context on how humans evaluate the complexity of a sentence in English. We collect a new dataset of sentences, where each sentence is rated for perceived complexity within different contextual windows. We carry out an in-depth analysis to detect which linguistic features correlate more with complexity judgments and with the degree of agreement among annotators. We train several regression models, using either explicit linguistic features or contextualized word embeddings, to predict the mean complexity values assigned to sentences in the different contextual windows, as well as their standard deviation. Results show that models leveraging explicit features capturing morphosyntactic and syntactic phenomena perform always better, especially when they have access to features extracted from all contextual sentences.

2020

The Style of a Successful Story: a Computational Study on the Fanfiction Genre
Andrea Mattei | Dominique Brunato | Felice Dell’Orletta
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

Tracking the Evolution of Written Language Competence in L2 Spanish Learners
Alessio Miaschi | Sam Davidson | Dominique Brunato | Felice Dell’Orletta | Kenji Sagae | Claudia Helena Sanchez-Gutierrez | Giulia Venturi
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper we present an NLP-based approach for tracking the evolution of written language competence in L2 Spanish learners using a wide range of linguistic features automatically extracted from students’ written productions. Beyond reporting classification results for different scenarios, we explore the connection between the most predictive features and the teaching curriculum, finding that our set of linguistic features often reflect the explicit instructions that students receive during each course.

Linguistic Profiling of a Neural Language Model
Alessio Miaschi | Dominique Brunato | Felice Dell’Orletta | Giulia Venturi
Proceedings of the 28th International Conference on Computational Linguistics

In this paper we investigate the linguistic knowledge learned by a Neural Language Model (NLM) before and after a fine-tuning process and how this knowledge affects its predictions during several classification problems. We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different levels of linguistic annotation. We show that BERT is able to encode a wide range of linguistic characteristics, but it tends to lose this information when trained on specific downstream tasks. We also find that BERT’s capacity to encode different kind of linguistic properties has a positive influence on its predictions: the more it stores readable linguistic information of a sentence, the higher will be its capacity of predicting the expected label assigned to that sentence.

Is Neural Language Model Perplexity Related to Readability?
Alessio Miaschi | Chiara Alzetta | Dominique Brunato | Felice Dell’Orletta | Giulia Venturi
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

Profiling-UD: a Tool for Linguistic Profiling of Texts
Dominique Brunato | Andrea Cimino | Felice Dell’Orletta | Giulia Venturi | Simonetta Montemagni
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we introduce Profiling–UD, a new text analysis tool inspired to the principles of linguistic profiling that can support language variation research from different perspectives. It allows the extraction of more than 130 features, spanning across different levels of linguistic description. Beyond the large number of features that can be monitored, a main novelty of Profiling–UD is that it has been specifically devised to be multilingual since it is based on the Universal Dependencies framework. In the second part of the paper, we demonstrate the effectiveness of these features in a number of theoretical and applicative studies in which they were successfully used for text and author profiling.

Italian Transformers Under the Linguistic Lens
Alessio Miaschi | Gabriele Sarti | Dominique Brunato | Felice Dell’Orletta | Giulia Venturi
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

2019

What Makes a Review helpful? Predicting the Helpfulness of Italian TripAdvisor Reviews
Giulia Chiriatti | Dominique Brunato | Felice Dell’Orletta | Giulia Venturi
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

Lost in Text. A Cross-Genre Analysis of Linguistic Phenomena within Text
Chiara Buongiovanni | Francesco Gracci | Dominique Brunato | Felice Dell’Orletta
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

Italian and English Sentence Simplification: How Many Differences?
Martina Fieromonte | Dominique Brunato | Felice Dell’Orletta | Giulia Venturi
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

2018

Lexicon and Syntax: Complexity across Genres and Language Varieties
Pietro Dell’Oglio | Dominique Brunato | Felice Dell’Orletta
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

DARC-IT: a DAtaset for Reading Comprehension in ITalian
Dominique Brunato | Martina Valeriani | Felice Dell’Orletta
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

Sentences and Documents in Native Language Identification
Andrea Cimino | Felice Dell’Orletta | Dominique Brunato | Giulia Venturi
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

Is this Sentence Difficult? Do you Agree?
Dominique Brunato | Lorenzo De Mattei | Felice Dell’Orletta | Benedetta Iavarone | Giulia Venturi
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper, we present a crowdsourcing-based approach to model the human perception of sentence complexity. We collect a large corpus of sentences rated with judgments of complexity for two typologically-different languages, Italian and English. We test our approach in two experimental scenarios aimed to investigate the contribution of a wide set of lexical, morpho-syntactic and syntactic phenomena in predicting i) the degree of agreement among annotators independently from the assigned judgment and ii) the perception of sentence complexity.

Gender and Genre Linguistic Profiling: A Case Study on Female and Male Journalistic and Diary Prose
Eleonora Cocciu | Dominique Brunato | Giulia Venturi | Felice Dell’Orletta
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

2017

On the order of Words in Italian: a Study on Genre vs Complexity
Dominique Brunato | Felice Dell’Orletta
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2016

Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Dominique Brunato | Felice Dell’Orletta | Giulia Venturi | Thomas François | Philippe Blache
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification
Dominique Brunato | Andrea Cimino | Felice Dell’Orletta | Giulia Venturi
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

Design and Annotation of the First Italian Corpus for Text Simplification
Dominique Brunato | Felice Dell’Orletta | Giulia Venturi | Simonetta Montemagni
Proceedings of the 9th Linguistic Annotation Workshop

Co-authors

Venues