William Croft

2025

Verbal Predication Constructions in Universal Dependencies
William Croft | Joakim Nivre
Proceedings of the Second International Workshop on Construction Grammars and NLP

Is the framework of Universal Dependencies (UD) compatible with findings from linguistic typology about constructions in the world’s languages? To address this question, we need to systematically review how UD represents these constructions, and how it handles the range of morphosyntactic variation attested across languages. In this paper, we present the results of such a review focusing on verbal predication constructions. We find that, although UD can represent all major constructions in this area, the guidelines are not completely coherent with respect to the criteria for core argument relations and not completely systematic in the definition of subtypes for nonbasic voice constructions. To improve the overall coherence of the guidelines, we propose a number of revisions for future versions of UD.

pdf bib abs

Reference and Modification in Universal Dependencies
Joakim Nivre | William Croft
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)

Is the framework of Universal Dependencies (UD) compatible with findings from linguistic typology? To address this question, we need to systematically review how UD represents linguistic constructions in the world’s languages, and how it handles the range of morphosyntactic variation attested in linguistic typology. In this paper, we start this review by discussing reference and modification constructions. The review shows that, although UD can represent all major constructions in this area, there are a number of cases where UD categories do not align systematically with a typological classification of constructions, and where constructional similarity is therefore not transparent across languages. We also identify limitations in the representation of certain morphosyntactic strategies, notably indexation and linkers. To overcome these limitations, we propose a number of revisions that may be considered for future versions of UD.

2024

pdf bib abs

This paper presents MoCCA, a Model of Comparative Concepts for Aligning Constructicons under development by a consortium of research groups building Constructicons of different languages including Brazilian Portuguese, English, German and Swedish. The Constructicons will be aligned by using comparative concepts (CCs) providing language-neutral definitions of linguistic properties. The CCs are drawn from typological research on grammatical categories and constructions, and from FrameNet frames, organized in a conceptual network. Language-specific constructions are linked to the CCs in accordance with general principles. MoCCA is organized into files of two types: a largely static CC Database file and multiple Linking files containing relations between constructions in a Constructicon and the CCs. Tools are planned to facilitate visualization of the CC network and linking of constructions to the CCs. All files and guidelines will be versioned, and a mechanism is set up to report cases where a language-specific construction cannot be easily linked to existing CCs.

This paper reports the first release of the UMR (Uniform Meaning Representation) data set. UMR is a graph-based meaning representation formalism consisting of a sentence-level graph and a document-level graph. The sentence-level graph represents predicate-argument structures, named entities, word senses, aspectuality of events, as well as person and number information for entities. The document-level graph represents coreferential, temporal, and modal relations that go beyond sentence boundaries. UMR is designed to capture the commonalities and variations across languages and this is done through the use of a common set of abstract concepts, relations, and attributes as well as concrete concepts derived from words from invidual languages. This UMR release includes annotations for six languages (Arapaho, Chinese, English, Kukama, Navajo, Sanapana) that vary greatly in terms of their linguistic properties and resource availability. We also describe on-going efforts to enlarge this data set and extend it to other genres and modalities. We also briefly describe the available infrastructure (UMR annotation guidelines and tools) that others can use to create similar data sets.

pdf bib abs

The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages. However, the UD annotations do not tell the full story. Grammatical constructions that convey meaning through a particular combination of several morphosyntactic elements—for example, interrogative sentences with special markers and/or word orders—are not labeled holistically. We argue for (i) augmenting UD annotations with a ‘UCxn’ annotation layer for such meaning-bearing grammatical constructions, and (ii) approaching this in a typologically informed way so that morphosyntactic strategies can be compared across languages. As a case study, we consider five construction families in ten languages, identifying instances of each construction in UD treebanks through the use of morphosyntactic patterns. In addition to findings regarding these particular constructions, our study yields important insights on methodology for describing and identifying constructions in language-general and language-particular ways, and lays the foundation for future constructional enrichment of UD treebanks.

2023

This paper presents detailed mappings between the structures used in Abstract Meaning Representation (AMR) and those used in Uniform Meaning Representation (UMR). These structures include general semantic roles, rolesets, and concepts that are largely shared between AMR and UMR, but with crucial differences. While UMR annotation of new low-resource languages is ongoing, AMR-annotated corpora already exist for many languages, and these AMR corpora are ripe for conversion to UMR format. Rather than focusing on semantic coverage that is new to UMR (which will likely need to be dealt with manually), this paper serves as a resource (with illustrated mappings) for users looking to understand the fine-grained adjustments that have been made to the representation techniques for semantic categoriespresent in both AMR and UMR.

2021

pdf bib abs

Theoretical and Practical Issues in the Semantic Annotation of Four Indigenous Languages
Jens E. L. Van Gysel | Meagan Vigus | Lukas Denk | Andrew Cowell | Rosa Vallejos | Tim O’Gorman | William Croft
Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

Computational resources such as semantically annotated corpora can play an important role in enabling speakers of indigenous minority languages to participate in government, education, and other domains of public life in their own language. However, many languages – mainly those with small native speaker populations and without written traditions – have little to no digital support. One hurdle in creating such resources is that for many languages, few speakers would be capable of annotating texts – a task which requires literacy and some linguistic training – and that these experts’ time is typically in high demand for language planning work. This paper assesses whether typologically trained non-speakers of an indigenous language can feasibly perform semantic annotation using Uniform Meaning Representations, thus allowing for the creation of computational materials without putting further strain on community resources.

2020

pdf bib

pdf bib abs

This paper presents a “road map” for the annotation of semantic categories in typologically diverse languages, with potentially few linguistic resources, and often no existing computational resources. Past semantic annotation efforts have focused largely on high-resource languages, or relatively low-resource languages with a large number of native speakers. However, there are certain typological traits, namely the synthesis of multiple concepts into a single word, that are more common in languages with a smaller speech community. For example, what is expressed as a sentence in a more analytic language like English, may be expressed as a single word in a more synthetic language like Arapaho. This paper proposes solutions for annotating analytic and synthetic languages in a comparable way based on existing typological research, and introduces a road map for the annotation of languages with a dearth of resources.

pdf bib abs

Representing constructional metaphors
Pavlina Kalm | Michael Regan | Sook-kyung Lee | Chris Peverada | William Croft
Proceedings of the Second International Workshop on Designing Meaning Representations

This paper introduces a representation and annotation scheme for argument structure constructions that are used metaphorically with verbs in different semantic domains. We aim to contribute to the study of constructional metaphors which has received little attention in theoretical and computational linguistics. The proposed representation consists of a systematic mapping between the constructional and verbal event structures in two domains. It reveals the semantic motivations that lead to constructions being metaphorically extended. We demonstrate this representation on argument structure constructions with Transfer of Possession verbs and test the viability of this scheme with an annotation exercise.

2019

pdf bib

pdf bib abs

Cross-Linguistic Semantic Annotation: Reconciling the Language-Specific and the Universal
Jens E. L. Van Gysel | Meagan Vigus | Pavlina Kalm | Sook-kyung Lee | Michael Regan | William Croft
Proceedings of the First International Workshop on Designing Meaning Representations

Developers of cross-linguistic semantic annotation schemes face a number of issues not encountered in monolingual annotation. This paper discusses four such issues, related to the establishment of annotation labels, and the treatment of languages with more fine-grained, more coarse-grained, and cross-cutting categories. We propose that a lattice-like architecture of the annotation categories can adequately handle all four issues, and at the same time remain both intuitive for annotators and faithful to typological insights. This position is supported by a brief annotation experiment.

pdf bib abs

Event Structure Representation: Between Verbs and Argument Structure Constructions
Pavlina Kalm | Michael Regan | William Croft
Proceedings of the First International Workshop on Designing Meaning Representations

This paper proposes a novel representation of event structure by separating verbal semantics and the meaning of argument structure constructions that verbs occur in. Our model demonstrates how the two meaning representations interact. Our model thus effectively deals with various verb construals in different argument structure constructions, unlike purely verb-based approaches. However, unlike many constructionally-based approaches, we also provide a richer representation of the event structure evoked by the verb meaning.

pdf bib abs

A Dependency Structure Annotation for Modality
Meagan Vigus | Jens E. L. Van Gysel | William Croft
Proceedings of the First International Workshop on Designing Meaning Representations

This paper presents an annotation scheme for modality that employs a dependency structure. Events and sources (here, conceivers) are represented as nodes and epistemic strength relations characterize the edges. The epistemic strength values are largely based on Saurí and Pustejovsky’s (2009) FactBank, while the dependency structure mirrors Zhang and Xue’s (2018b) approach to temporal relations. Six documents containing 377 events have been annotated by two expert annotators with high levels of agreement.

2018

pdf bib abs

A Rich Annotation Scheme for Mental Events
William Croft | Pavlína Pešková | Michael Regan | Sook-kyung Lee
Proceedings of the Workshop Events and Stories in the News 2018

We present a rich annotation scheme for the structure of mental events. Mental events are those in which the verb describes a mental state or process, usually oriented towards an external situation. While physical events have been described in detail and there are numerous studies of their semantic analysis and annotation, mental events are less thoroughly studied. The annotation scheme proposed here is based on decompositional analyses in the semantic and typological linguistic literature. The scheme was applied to the news corpus from the 2016 Events workshop, and error analysis of the test annotation provides suggestions for refinement and clarification of the annotation scheme.

pdf bib abs

Annotation of Tense and Aspect Semantics for Sentential AMR
Lucia Donatelli | Michael Regan | William Croft | Nathan Schneider
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

Although English grammar encodes a number of semantic contrasts with tense and aspect marking, these semantics are currently ignored by Abstract Meaning Representation (AMR) annotations. This paper extends sentence-level AMR to include a coarse-grained treatment of tense and aspect semantics. The proposed framework augments the representation of finite predications to include a four-way temporal distinction (event time before, up to, at, or after speech time) and several aspectual distinctions (including static vs. dynamic, habitual vs. episodic, and telic vs. atelic). This will enable AMR to be used for NLP tasks and applications that require sophisticated reasoning about time and event structure.

2017

pdf bib abs

Integrating Decompositional Event Structures into Storylines
William Croft | Pavlína Pešková | Michael Regan
Proceedings of the Events and Stories in the News Workshop

Storyline research links together events in stories and specifies shared participants in those stories. In these analyses, an atomic event is assumed to be a single clause headed by a single verb. However, many analyses of verbal semantics assume a decompositional analysis of events expressed in single clauses. We present a formalization of a decompositional analysis of events in which each participant in a clausal event has their own temporally extended subevent, and the subevents are related through causal and other interactions. This decomposition allows us to represent storylines as an evolving set of interactions between participants over time.