Linguistic Issues in Language Technology, Volume 11, 2014 - Theoretical and Computational Morphology: New Trends and Synergies
This paper is intended to elucidate some implications of usage-based linguistic theory for statistical and computational models of language acquisition, focusing on morphology and morphophonology. I discuss the need for grammar (a.k.a. abstraction), the contents of individual grammars (a potentially infinite number of constructions, paradigmatic mappings and predictive relationships between phonological units), the computational characteristics of constructions (complex non-crossover interactions among partially redundant features), resolution of competition among constructions (probability matching), and the need for multimodel inference in modeling internal grammars underlying the linguistic performance of a community.
This chapter demonstrates how compression algorithms can be used to address morphological and syntactic complexity in detail by analysing the contribution of specific linguistic features to English texts. The point of departure is the ongoing complexity debate and quest for complexity metrics. After decades of adhering to the equal complexity axiom, recent research seeks to define and measure linguistic complexity (Dahl 2004; Kortmann and Szmrecsanyi 2012; Miestamo et al. 2008). Against this backdrop, I present a new flavour of the Juola-style compression technique (Juola 1998), targeted manipulation. Essentially, compression algorithms are used to measure linguistic complexity via the relative informativeness in text samples. Thus, I assess the contribution of morphs such as –ing or –ed, and functional constructions such as progressive (be + verb-ing) or perfect (have + verb past participle) to the syntactic and morphological complexity in a mixedgenre corpus of Alice’s Adventures in Wonderland, the Gospel of Mark and newspaper texts. I find that a higher number of marker types leads to higher amounts of morphological complexity in the corpus. Syntactic complexity is reduced because the presence of morphological markers enhances the algorithmic prediction of linguistic patterns. To conclude, I show that information-theoretic methods yield linguistically meaningful results and can be used to measure the complexity of specific linguistic features in naturalistic copora.
This paper serves two purposes. It is a summary of much work concerning One compelling kind of evidence for the autonomy of a language’s morphology is the incidence of inflectional polyfunctionality, the systematic use of the same morphology to express distinct but related morphosyntactic content. Polyfunctionality is more complex than mere homophony. It can, in fact, arise in a number of ways: as an effect of rule invitation (wherein the same rule of exponence serves more than one function by interacting with other rules in more than one way), as an expression of morphosyntactic referral, as the effect of a rule of exponence realizing either a disjunction of property sets or a morphomic property set, or as the reflection of a morphosyntactic property set’s cross-categorial versatility. I distinguish these different sources of polyfunctionality in a formally precise way. It is inaccurate to see polyfunctionality as an ambiguating source of grammatical complexity; on the contrary, by enhancing the predictability of a language’s morphology, it may well enhance both the memorability of complex inflected forms and the ease with which they are processed.
By using the system of Ancient Greek verb endings as a case study, this paper deals with the cross-linguistically recurrent appearance of inflectional paradigms that, though generally characterized by cumulative exponence, contain segmentable “semi-separate” endings in correspondence with low-frequency cells. Such an exponence system has information-theoretic properties which may be relevant from the point of view of morphological theory. In particular, both the phenomena of semi-separate exponence and the instances of syncretism that conform to the Brøndalian Principle of Compensation may be viewed as different manifestations of a same cross-linguistic tendency not to let a paradigm’s exponent set be too distant from the situation of equiprobability.
Démonette is a derivational morphological network created from information provided by two existing lexical resources, DériF and Morphonette. It features a formal architecture in which words are associated with semantic types and where morphological relations, labelled with concrete and abstract bi-oriented definitions, connect derived words with their base and indirectly related words with each other.
This article aims to assess to what extent translation can shed light on the semantics of French evaluative prefixation by adopting No ̈el (2003)’s ‘translations as evidence for semantics’ approach. In French, evaluative prefixes can be classified along two dimensions (cf. (Fradin and Montermini 2009)): (1) a quantity dimension along a maximum/minimum axis and the semantic values big and small, and (2) a quality dimension along a positive/negative axis and the values good (excess; higher degree) and bad (lack; lower degree). In order to provide corpus-based insights into this semantic categorization, we analyze French evaluative prefixes alongside their English translation equivalents in a parallel corpus. To do so, we focus on periphrastic translations, as they are likely to ‘spell out’ the meaning of the French prefixes. The data used were extracted from the Europarl parallel corpus (Koehn 2005; Cartoni and Meyer 2012). Using a tailormade program, we first aligned the French prefixed words with the corresponding word(s) in English target sentences, before proceeding to the evaluation of the aligned sequences and the manual analysis of the bilingual data. Results confirm that translation data can be used as evidence for semantics in morphological research and help refine existing semantic descriptions of evaluative prefixes.