Manfred Pinkal


2019

We introduce MCScript2.0, a machine comprehension corpus for the end-to-end evaluation of script knowledge. MCScript2.0 contains approx. 20,000 questions on approx. 3,500 texts, crowdsourced based on a new collection process that results in challenging questions. Half of the questions cannot be answered from the reading texts, but require the use of commonsense and, in particular, script knowledge. We give a thorough analysis of our corpus and show that while the task is not challenging to humans, existing machine comprehension models fail to perform well on the data, even if they make use of a commonsense knowledge base. The dataset is available at http://www.sfb1102.uni-saarland.de/?page_id=2582
Script knowledge consists of detailed information on everyday activities. Such information is often taken for granted in text and needs to be inferred by readers. Therefore, script knowledge is a central component to language comprehension. Previous work on representing scripts is mostly based on extensive manual work or limited to scenarios that can be found with sufficient redundancy in large corpora. We introduce the task of scenario detection, in which we identify references to scripts. In this task, we address a wide range of different scripts (200 scenarios) and we attempt to identify all references to them in a collection of narrative texts. We present a first benchmark data set and a baseline model that tackles scenario detection using techniques from topic segmentation and text classification.

2018

We address the task of visual semantic role labeling (vSRL), the identification of the participants of a situation or event in a visual scene, and their labeling with their semantic relations to the event or situation. We render candidate participants as image regions of objects, and train a model which learns to ground roles in the regions which depict the corresponding participant. Experimental results demonstrate that we can train a vSRL model without reliance on prohibitive image-based role annotations, by utilizing noisy data which we extract automatically from image captions using a linguistic SRL system. Furthermore, our model induces frame—semantic visual representations, and their comparison to previous work on supervised visual verb sense disambiguation yields overall better results.
This report summarizes the results of the SemEval 2018 task on machine comprehension using commonsense knowledge. For this machine comprehension task, we created a new corpus, MCScript. It contains a high number of questions that require commonsense knowledge for finding the correct answer. 11 teams from 4 different countries participated in this shared task, most of them used neural approaches. The best performing system achieves an accuracy of 83.95%, outperforming the baselines by a large margin, but still far from the human upper bound, which was found to be at 98%.

2017

This paper presents an approach to the task of predicting an event description from a preceding sentence in a text. Our approach explores sequence-to-sequence learning using a bidirectional multi-layer recurrent neural network. Our approach substantially outperforms previous work in terms of the BLEU score on two datasets derived from WikiHow and DeScript respectively. Since the BLEU score is not easy to interpret as a measure of event prediction, we complement our study with a second evaluation that exploits the rich linguistic annotation of gold paraphrase sets of events.
Recent research in psycholinguistics has provided increasing evidence that humans predict upcoming content. Prediction also affects perception and might be a key to robustness in human language processing. In this paper, we investigate the factors that affect human prediction by building a computational model that can predict upcoming discourse referents based on linguistic knowledge alone vs. linguistic knowledge jointly with common-sense knowledge in the form of scripts. We find that script knowledge significantly improves model estimates of human predictions. In a second study, we test the highly controversial hypothesis that predictability influences referring expression type but do not find evidence for such an effect.
Word embeddings are now a standard technique for inducing meaning representations for words. For getting good representations, it is important to take into account different senses of a word. In this paper, we propose a mixture model for learning multi-sense word embeddings. Our model generalizes the previous works in that it allows to induce different weights of different senses of a word. The experimental results show that our model outperforms previous models on standard evaluation tasks.
Script knowledge plays a central role in text understanding and is relevant for a variety of downstream tasks. In this paper, we consider two recent datasets which provide a rich and general representation of script events in terms of paraphrase sets. We introduce the task of mapping event mentions in narrative texts to such script event types, and present a model for this task that exploits rich linguistic representations as well as information on temporal ordering. The results of our experiments demonstrate that this complex task is indeed feasible.
We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.

2016

We present an annotation study on a representative dataset of literal and idiomatic uses of German infinitive-verb compounds in newspaper and journal texts. Infinitive-verb compounds form a challenge for writers of German, because spelling regulations are different for literal and idiomatic uses. Through the participation of expert lexicographers we were able to obtain a high-quality corpus resource which offers itself as a testbed for automatic idiomaticity detection and coarse-grained word-sense disambiguation. We trained a classifier on the corpus which was able to distinguish literal and idiomatic uses with an accuracy of 85 %.
This paper presents the InScript corpus (Narrative Texts Instantiating Script structure). InScript is a corpus of 1,000 stories centered around 10 different scenarios. Verbs and noun phrases are annotated with event and participant types, respectively. Additionally, the text is annotated with coreference information. The corpus shows rich lexical variation and will serve as a unique resource for the study of the role of script knowledge in natural language processing.
Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing). We present a large-scale crowdsourced collection of explicit linguistic descriptions of script-specific event sequences (40 scenarios with 100 sequences each). The corpus is enriched with crowdsourced alignment annotation on a subset of the event descriptions, to be used in future work as seed data for automatic alignment of event descriptions (for example via clustering). The event descriptions to be aligned were chosen among those expected to have the strongest corrective effect on the clustering algorithm. The alignment annotation was evaluated against a gold standard of expert annotators. The resulting database of partially-aligned script-event descriptions provides a sound empirical basis for inducing high-quality script knowledge, as well as for any task involving alignment and paraphrase detection of events.

2015

2014

Paraphrases and paraphrasing algorithms have been found of great importance in various natural language processing tasks. While most paraphrase extraction approaches extract equivalent sentences, sentences are an inconvenient unit for further processing, because they are too specific, and often not exact paraphrases. Paraphrase fragment extraction is a technique that post-processes sentential paraphrases and prunes them to more convenient phrase-level units. We present a new approach that uses semantic roles to extract paraphrase fragments from sentence pairs that share semantic content to varying degrees, including full paraphrases. In contrast to previous systems, the use of semantic parses allows for extracting paraphrases with high wording variance and different syntactic categories. The approach is tested on four different input corpora and compared to two previous systems for extracting paraphrase fragments. Our system finds three times as many good paraphrase fragments per sentence pair as the baselines, and at the same time outputs 30% fewer unrelated fragment pairs.

2013

Recent work has shown that the integration of visual information into text-based models can substantially improve model predictions, but so far only visual information extracted from static images has been used. In this paper, we consider the problem of grounding sentences describing actions in visual information extracted from videos. We present a general purpose corpus that aligns high quality videos with multiple natural language descriptions of the actions portrayed in the videos, together with an annotation of how similar the action descriptions are to each other. Experimental results demonstrate that a text-based model of similarity between actions improves substantially when combined with visual information from videos depicting the described actions.

2012

2011

2010

We present a method and a software tool, the FrameNet Transformer, for deriving customized versions of the FrameNet database based on frame and frame element relations. The FrameNet Transformer allows users to iteratively coarsen the FrameNet sense inventory in two ways. First, the tool can merge entire frames that are related by user-specified relations. Second, it can merge word senses that belong to frames related by specified relations. Both methods can be interleaved. The Transformer automatically outputs format-compliant FrameNet versions, including modified corpus annotation files that can be used for automatic processing. The customized FrameNet versions can be used to determine which granularity is suitable for particular applications. In our evaluation of the tool, we show that our method increases accuracy of statistical semantic parsers by reducing the number of word-senses (frames) per lemma, and increasing the number of annotated sentences per lexical unit and frame. We further show in an experiment on the FATE corpus that by coarsening FrameNet we do not incur a significant loss of information that is relevant to the Recognizing Textual Entailment task.

2009

2008

We present an approach for the cross-lingual induction of speech recognition grammars that separates the task of translation from the task of grammar generation. The source speech recognition grammar is used to generate phrases, which are translated by a common translation service. The target recognition grammar is induced by using the production rules of the source language, manually translated sentences and a statistical word alignment tool. We induce grammars for the target languages Spanish and Japanese. The coverage of the resulting grammars is evaluated on two corpora and compared quantitatively and qualitatively to a grammar induced with unsupervised monolingual grammar induction.

2006

This paper describes the SALSA corpus, a large German corpus manually annotated with manual role-semantic annotation, based on the syntactically annotated TIGER newspaper corpus. The first release, comprising about 20,000 annotated predicate instances (about half the TIGER corpus), is scheduled for mid-2006. In this paper we discuss the annotation framework (frame semantics) and its cross-lingual applicability, problems arising from exhaustive annotation, strategies for quality control, and possible applications.

2003

2000

1997

1996

1991

1986