Dominique Brunato


2021

pdf bib
What Makes My Model Perplexed? A Linguistic Investigation on Neural Language Models Perplexity
Alessio Miaschi | Dominique Brunato | Felice Dell’Orletta | Giulia Venturi
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

This paper presents an investigation aimed at studying how the linguistic structure of a sentence affects the perplexity of two of the most popular Neural Language Models (NLMs), BERT and GPT-2. We first compare the sentence-level likelihood computed with BERT and the GPT-2’s perplexity showing that the two metrics are correlated. In addition, we exploit linguistic features capturing a wide set of morpho-syntactic and syntactic phenomena showing how they contribute to predict the perplexity of the two NLMs.

pdf bib
That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models
Gabriele Sarti | Dominique Brunato | Felice Dell’Orletta
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

This paper investigates the relationship between two complementary perspectives in the human assessment of sentence complexity and how they are modeled in a neural language model (NLM). The first perspective takes into account multiple online behavioral metrics obtained from eye-tracking recordings. The second one concerns the offline perception of complexity measured by explicit human judgments. Using a broad spectrum of linguistic features modeling lexical, morpho-syntactic, and syntactic properties of sentences, we perform a comprehensive analysis of linguistic phenomena associated with the two complexity viewpoints and report similarities and differences. We then show the effectiveness of linguistic features when explicitly leveraged by a regression model for predicting sentence complexity and compare its results with the ones obtained by a fine-tuned neural language model. We finally probe the NLM’s linguistic competence before and after fine-tuning, highlighting how linguistic information encoded in representations changes when the model learns to predict complexity.

pdf bib
Sentence Complexity in Context
Benedetta Iavarone | Dominique Brunato | Felice Dell’Orletta
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

We study the influence of context on how humans evaluate the complexity of a sentence in English. We collect a new dataset of sentences, where each sentence is rated for perceived complexity within different contextual windows. We carry out an in-depth analysis to detect which linguistic features correlate more with complexity judgments and with the degree of agreement among annotators. We train several regression models, using either explicit linguistic features or contextualized word embeddings, to predict the mean complexity values assigned to sentences in the different contextual windows, as well as their standard deviation. Results show that models leveraging explicit features capturing morphosyntactic and syntactic phenomena perform always better, especially when they have access to features extracted from all contextual sentences.

2020

pdf bib
Linguistic Profiling of a Neural Language Model
Alessio Miaschi | Dominique Brunato | Felice Dell’Orletta | Giulia Venturi
Proceedings of the 28th International Conference on Computational Linguistics

In this paper we investigate the linguistic knowledge learned by a Neural Language Model (NLM) before and after a fine-tuning process and how this knowledge affects its predictions during several classification problems. We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different levels of linguistic annotation. We show that BERT is able to encode a wide range of linguistic characteristics, but it tends to lose this information when trained on specific downstream tasks. We also find that BERT’s capacity to encode different kind of linguistic properties has a positive influence on its predictions: the more it stores readable linguistic information of a sentence, the higher will be its capacity of predicting the expected label assigned to that sentence.

pdf bib
Profiling-UD: a Tool for Linguistic Profiling of Texts
Dominique Brunato | Andrea Cimino | Felice Dell’Orletta | Giulia Venturi | Simonetta Montemagni
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we introduce Profiling–UD, a new text analysis tool inspired to the principles of linguistic profiling that can support language variation research from different perspectives. It allows the extraction of more than 130 features, spanning across different levels of linguistic description. Beyond the large number of features that can be monitored, a main novelty of Profiling–UD is that it has been specifically devised to be multilingual since it is based on the Universal Dependencies framework. In the second part of the paper, we demonstrate the effectiveness of these features in a number of theoretical and applicative studies in which they were successfully used for text and author profiling.

pdf bib
Tracking the Evolution of Written Language Competence in L2 Spanish Learners
Alessio Miaschi | Sam Davidson | Dominique Brunato | Felice Dell’Orletta | Kenji Sagae | Claudia Helena Sanchez-Gutierrez | Giulia Venturi
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper we present an NLP-based approach for tracking the evolution of written language competence in L2 Spanish learners using a wide range of linguistic features automatically extracted from students’ written productions. Beyond reporting classification results for different scenarios, we explore the connection between the most predictive features and the teaching curriculum, finding that our set of linguistic features often reflect the explicit instructions that students receive during each course.

2018

pdf bib
Is this Sentence Difficult? Do you Agree?
Dominique Brunato | Lorenzo De Mattei | Felice Dell’Orletta | Benedetta Iavarone | Giulia Venturi
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper, we present a crowdsourcing-based approach to model the human perception of sentence complexity. We collect a large corpus of sentences rated with judgments of complexity for two typologically-different languages, Italian and English. We test our approach in two experimental scenarios aimed to investigate the contribution of a wide set of lexical, morpho-syntactic and syntactic phenomena in predicting i) the degree of agreement among annotators independently from the assigned judgment and ii) the perception of sentence complexity.

2017

pdf bib
On the order of Words in Italian: a Study on Genre vs Complexity
Dominique Brunato | Felice Dell’Orletta
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2016

pdf bib
PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification
Dominique Brunato | Andrea Cimino | Felice Dell’Orletta | Giulia Venturi
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Dominique Brunato | Felice Dell’Orletta | Giulia Venturi | Thomas François | Philippe Blache
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

2015

pdf bib
Design and Annotation of the First Italian Corpus for Text Simplification
Dominique Brunato | Felice Dell’Orletta | Giulia Venturi | Simonetta Montemagni
Proceedings of The 9th Linguistic Annotation Workshop