Linguistic Issues in Language Technology, Volume 13, 2016
Verb phrase (VP) ellipsis is the omission of a verb phrase whose meaning can be reconstructed from the linguistic or real-world context. It is licensed in English by auxiliary verbs, often modal auxiliaries: She can go to Hawaii but he can’t [e]. This paper describes a system called ViPER (VP Ellipsis Resolver) that detects and resolves VP ellipsis, relying on linguistic principles such as syntactic parallelism, modality correlations, and the delineation of core vs. peripheral sentence constituents. The key insight guiding the work is that not all cases of ellipsis are equally difficult: some can be detected and resolved with high confidence even before we are able to build systems with human-level semantic and pragmatic understanding of text.
Quantification (see e.g. Peters and Westerst ̊ahl, 2006) is probably one of the most extensively studied phenomena in formal semantics. But because of the specific representation of meaning assumed by modeltheoretic semantics (one where a true model of the world is a priori available), research in the area has primarily focused on one question: what is the relation of a quantifier to the truth value of a sentence? In contrast, relatively little has been said about the way the underlying model comes about, and its relation to individual speakers’ conceptual knowledge. In this paper, we make a first step in investigating how native speakers of English model relations between non-grounded sets, by observing how they quantify simple statements. We first give some motivation for our task, from both a theoretical linguistic and computational semantic point of view (§2). We then describe our annotation setup (§3) and follow on with an analysis of the produced dataset, conducting a quantitative evaluation which includes inter-annotator agreement for different classes of predicates (§4). We observe that there is significant agreement between speakers but also noticeable variations. We posit that in settheoretic terms, there are as many worlds as there are speakers (§5), but the overwhelming use of underspecified quantification in ordinary language covers up the individual differences that might otherwise be observed.
This paper proposes a method for improving the results of a statistical Machine Translation system using boundedness, a pragmatic component of the verbal phrase’s lexical aspect. First, the paper presents manual and automatic annotation experiments for lexical aspect in EnglishFrench parallel corpora. It will be shown that this aspectual property is identified and classified with ease both by humans and by automatic systems. Second, Statistical Machine Translation experiments using the boundedness annotations are presented. These experiments show that the information regarding lexical aspect is useful to improve the output of a Machine Translation system in terms of better choices of verbal tenses in the target language, as well as better lexical choices. Ultimately, this work aims at providing a method for the automatic annotation of data with boundedness information and at contributing to Machine Translation by taking into account linguistic data.
Abstract syntax is a semantic tree representation that lies between parse trees and logical forms. It abstracts away from word order and lexical items, but contains enough information to generate both surface strings and logical forms. Abstract syntax is commonly used in compilers as an intermediate between source and target languages. Grammatical Framework (GF) is a grammar formalism that generalizes the idea to natural languages, to capture cross-lingual generalizations and perform interlingual translation. As one of the main results, the GF Resource Grammar Library (GF-RGL) has implemented a shared abstract syntax for over 30 languages. Each language has its own set of concrete syntax rules (morphology and syntax), by which it can be generated from the abstract syntax and parsed into it. This paper presents a conversion method from abstract syntax trees to dependency trees. The method is applied for converting GF-RGL trees to Universal Dependencies (UD), which uses a common set of labels for different languages. The correspondence between GF-RGL and UD turns out to be good, and the relatively few discrepancies give rise to interesting questions about universality. The conversion also has potential for practical applications: (1) it makes the GF parser usable as a rule-based dependency parser; (2) it enables bootstrapping UD treebanks from GF treebanks; (3) it defines formal criteria to assess the informal annotation schemes of UD; (4) it gives a method to check the consistency of manually annotated UD trees with respect to the annotation schemes; (5) it makes information from UD treebanks available.