Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing

Leonor Becerra-Bonache, M. Dolores Jiménez-López, Carlos Martín-Vide, Adrià Torrens-Urrutia (Editors)

Anthology ID:
Santa Fe, New-Mexico
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing
Leonor Becerra-Bonache | M. Dolores Jiménez-López | Carlos Martín-Vide | Adrià Torrens-Urrutia

pdf bib
A Gold Standard to Measure Relative Linguistic Complexity with a Grounded Language Learning Model
Leonor Becerra-Bonache | Henning Christiansen | M. Dolores Jiménez-López

This paper focuses on linguistic complexity from a relative perspective. It presents a grounded language learning system that can be used to study linguistic complexity from a developmental point of view and introduces a tool for generating a gold standard in order to evaluate the performance of the learning system. In general, researchers agree that it is more feasible to approach complexity from an objective or theory-oriented viewpoint than from a subjective or user-related point of view. Studies that have adopted a relative complexity approach have showed some preferences for L2 learners. In this paper, we try to show that computational models of the process of language acquisition may be an important tool to consider children and the process of first language acquisition as suitable candidates for evaluating the complexity of languages.

pdf bib
Computational Complexity of Natural Languages: A Reasoned Overview
António Branco

There has been an upsurge of research interest in natural language complexity. As this interest will benefit from being informed by established contributions in this area, this paper presents a reasoned overview of central results concerning the computational complexity of natural language parsing. This overview also seeks to help to understand why, contrary to recent and widespread assumptions, it is by no means sufficient that an agent handles sequences of items under a pattern an bn or under a pattern an bm cn dm to ascertain ipso facto that this is the result of at least an underlying context-free grammar or an underlying context-sensitive grammar, respectively. In addition, it seeks to help to understand why it is also not sufficient that an agent handles sequences of items under a pattern an bn for it to be deemed as having a cognitive capacity of higher computational complexity.

pdf bib
Modeling Violations of Selectional Restrictions with Distributional Semantics
Emmanuele Chersoni | Adrià Torrens Urrutia | Philippe Blache | Alessandro Lenci

Distributional Semantic Models have been successfully used for modeling selectional preferences in a variety of scenarios, since distributional similarity naturally provides an estimate of the degree to which an argument satisfies the requirement of a given predicate. However, we argue that the performance of such models on rare verb-argument combinations has received relatively little attention: it is not clear whether they are able to distinguish the combinations that are simply atypical, or implausible, from the semantically anomalous ones, and in particular, they have never been tested on the task of modeling their differences in processing complexity. In this paper, we compare two different models of thematic fit by testing their ability of identifying violations of selectional restrictions in two datasets from the experimental studies.

pdf bib
Comparing morphological complexity of Spanish, Otomi and Nahuatl
Ximena Gutierrez-Vasques | Victor Mijangos

We use two small parallel corpora for comparing the morphological complexity of Spanish, Otomi and Nahuatl. These are languages that belong to different linguistic families, the latter are low-resourced. We take into account two quantitative criteria, on one hand the distribution of types over tokens in a corpus, on the other, perplexity and entropy as indicators of word structure predictability. We show that a language can be complex in terms of how many different morphological word forms can produce, however, it may be less complex in terms of predictability of its internal structure of words.

pdf bib
Uniform Information Density Effects on Syntactic Choice in Hindi
Ayush Jain | Vishal Singh | Sidharth Ranjan | Rajakrishnan Rajkumar | Sumeet Agarwal

According to the UNIFORM INFORMATION DENSITY (UID) hypothesis (Levy and Jaeger, 2007; Jaeger, 2010), speakers tend to distribute information density across the signal uniformly while producing language. The prior works cited above studied syntactic reduction in language production at particular choice points in a sentence. In contrast, we use a variant of the above UID hypothesis in order to investigate the extent to which word order choices in Hindi are influenced by the drive to minimize the variance of information across entire sentences. To this end, we propose multiple lexical and syntactic measures (at both word and constituent levels) to capture the uniform spread of information across a sentence. Subsequently, we incorporate these measures in machine learning models aimed to distinguish between a naturally occurring corpus sentence and its grammatical variants (expressing the same idea). Our results indicate that our UID measures are not a significant factor in predicting the corpus sentence in the presence of lexical surprisal, a competing control predictor. Finally, in the light of other recent works, we conclude with a discussion of reasons for UID not being suitable for a theory of word order.

pdf bib
Investigating the importance of linguistic complexity features across different datasets related to language learning
Ildikó Pilán | Elena Volodina

We present the results of our investigations aiming at identifying the most informative linguistic complexity features for classifying language learning levels in three different datasets. The datasets vary across two dimensions: the size of the instances (texts vs. sentences) and the language learning skill they involve (reading comprehension texts vs. texts written by learners themselves). We present a subset of the most predictive features for each dataset, taking into consideration significant differences in their per-class mean values and show that these subsets lead not only to simpler models, but also to an improved classification performance. Furthermore, we pinpoint fourteen central features that are good predictors regardless of the size of the linguistic unit analyzed or the skills involved, which include both morpho-syntactic and lexical dimensions.

pdf bib
An Approach to Measuring Complexity with a Fuzzy Grammar & Degrees of Grammaticality
Adrià Torrens Urrutia

This paper presents an approach to evaluate complexity of a given natural language input by means of a Fuzzy Grammar with some fuzzy logic formulations. Usually, the approaches in linguistics has described a natural language grammar by means of discrete terms. However, a grammar can be explained in terms of degrees by following the concepts of linguistic gradience & fuzziness. Understanding a grammar as a fuzzy or gradient object allows us to establish degrees of grammaticality for every linguistic input. This shall be meaningful for linguistic complexity considering that the less grammatical an input is the more complex its processing will be. In this regard, the degree of complexity of a linguistic input (which is a linguistic representation of a natural language expression) depends on the chosen grammar. The bases of the fuzzy grammar are shown here. Some of these are described by Fuzzy Type Theory. The linguistic inputs are characterized by constraints through a Property Grammar.