Katrin Erk


2024

pdf bib
SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events
Sai Vallurupalli | Katrin Erk | Francis Ferraro
Findings of the Association for Computational Linguistics: ACL 2024

Interpreting and assessing goal driven actions is vital to understanding and reasoning over complex events. It is important to be able to acquire the knowledge needed for this understanding, though doing so is challenging. We argue that such knowledge can be elicited through a participant achievement lens. We analyze a complex event in a narrative according to the intended achievements of the participants in that narrative, the likely future actions of the participants, and the likelihood of goal success. We collect 6.3K high quality goal and action annotations reflecting our proposed participant achievement lens, with an average weighted Fleiss-Kappa IAA of 80%. Our collection contains annotated alternate versions of each narrative. These alternate versions vary minimally from the “original” story, but can license drastically different inferences. Our findings suggest that while modern large language models can reflect some of the goal-based knowledge we study, they find it challenging to fully capture the design and intent behind concerted actions, even when the model pretraining included the data from which we extracted the goal knowledge. We show that smaller models fine-tuned on our dataset can achieve performance surpassing larger models.

pdf bib
X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs
Juan Rodriguez | Katrin Erk | Greg Durrett
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Understanding when two pieces of text convey the same information is a goal touching many subproblems in NLP, including textual entailment and fact-checking. This problem becomes more complex when those two pieces of text are in different languages. Here, we introduce X-PARADE (Cross-lingual Paragraph-level Analysis of Divergences and Entailments), the first cross-lingual dataset of paragraph-level information divergences. Annotators label a paragraph in a target language at the span level and evaluate it with respect to a corresponding paragraph in a source language, indicating whether a given piece of information is the same, new, or new but can be inferred. This last notion establishes a link with cross-language NLI. Aligned paragraphs are sourced from Wikipedia pages in different languages, reflecting real information divergences observed in the wild. Armed with our dataset, we investigate a diverse set of approaches for this problem, including classic token alignment from machine translation, textual entailment methods that localize their decisions, and prompting LLMs. Our results show that these methods vary in their capability to handle inferable information, but they all fall short of human performance.

pdf bib
Adjusting Interpretable Dimensions in Embedding Space with Human Judgments
Katrin Erk | Marianna Apidianaki
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Embedding spaces contain interpretable dimensions indicating gender, formality in style, or even object properties. This has been observed multiple times. Such interpretable dimensions are becoming valuable tools in different areas of study, from social science to neuroscience. The standard way to compute these dimensions uses contrasting seed words and computes difference vectors over them. This is simple but does not always work well. We combine seed-based vectors with guidance from human ratings of where words fall along a specific dimension, and evaluate on predicting both object properties like size and danger, and the stylistic properties of formality and complexity. We obtain interpretable dimensions with markedly better performance especially in cases where seed-based dimensions do not work well.

pdf bib
To Learn or Not to Learn: Replaced Token Detection for Learning the Meaning of Negation
Gunjan Bhattarai | Katrin Erk
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

State-of-the-art language models perform well on a variety of language tasks, but they continue to struggle with understanding negation cues in tasks like natural language inference (NLI). Inspired by Hossain et al. (2020), who show under-representation of negation in language model pretraining datasets, we experiment with additional pretraining with negation data for which we introduce two new datasets. We also introduce a new learning strategy for negation building on ELECTRA’s (Clark et al., 2020) replaced token detection objective. We find that continuing to pretrain ELECTRA-Small’s discriminator leads to substantial gains on a variant of RTE (Recognizing Textual Entailment) with additional negation. On SNLI (Stanford NLI) (Bowman et al., 2015), there are no gains due to the extreme under-representation of negation in the data. Finally, on MNLI (Multi-NLI) (Williams et al., 2018), we find that performance on negation cues is primarily stymied by neutral-labeled examples.

2023

pdf bib
A Method for Studying Semantic Construal in Grammatical Constructions with Interpretable Contextual Embedding Spaces
Gabriella Chronis | Kyle Mahowald | Katrin Erk
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study semantic construal in grammatical constructions using large language models. First, we project contextual word embeddings into three interpretable semantic spaces, each defined by a different set of psycholinguistic feature norms. We validate these interpretable spaces and then use them to automatically derive semantic characterizations of lexical items in two grammatical constructions: nouns in subject or object position within the same sentence, and the AANN construction (e.g., ‘a beautiful three days’). We show that a word in subject position is interpreted as more agentive than the very same word in object position, and that the nouns in the AANN construction are interpreted as more measurement-like than when in the canonical alternation. Our method can probe the distributional meaning of syntactic constructions at a templatic level, abstracted away from specific lexemes.

pdf bib
SAGEViz: SchemA GEneration and Visualization
Sugam Devare | Mahnaz Koupaee | Gautham Gunapati | Sayontan Ghosh | Sai Vallurupalli | Yash Kumar Lal | Francis Ferraro | Nathanael Chambers | Greg Durrett | Raymond Mooney | Katrin Erk | Niranjan Balasubramanian
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Schema induction involves creating a graph representation depicting how events unfold in a scenario. We present SAGEViz, an intuitive and modular tool that utilizes human-AI collaboration to create and update complex schema graphs efficiently, where multiple annotators (humans and models) can work simultaneously on a schema graph from any domain. The tool consists of two components: (1) a curation component powered by plug-and-play event language models to create and expand event sequences while human annotators validate and enrich the sequences to build complex hierarchical schemas, and (2) an easy-to-use visualization component to visualize schemas at varying levels of hierarchy. Using supervised and few-shot approaches, our event language models can continually predict relevant events starting from a seed event. We conduct a user study and show that users need less effort in terms of interaction steps with SAGEViz to generate schemas of better quality. We also include a video demonstrating the system.

2022

pdf bib
POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events
Sai Vallurupalli | Sayontan Ghosh | Katrin Erk | Niranjan Balasubramanian | Francis Ferraro
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Knowledge about outcomes is critical for complex event understanding but is hard to acquire.We show that by pre-identifying a participant in a complex event, crowdworkers are ableto (1) infer the collective impact of salient events that make up the situation, (2) annotate the volitional engagement of participants in causing the situation, and (3) ground theoutcome of the situation in state changes of the participants. By creating a multi-step interface and a careful quality control strategy, we collect a high quality annotated dataset of8K short newswire narratives and ROCStories with high inter-annotator agreement (0.74-0.96weighted Fleiss Kappa). Our dataset, POQUe (Participant Outcome Questions), enables theexploration and development of models that address multiple aspects of semantic understanding. Experimentally, we show that current language models lag behind human performance in subtle ways through our task formulations that target abstract and specific comprehension of a complex event, its outcome, and a participant’s influence over the event culmination.

pdf bib
longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks.
Venelin Kovatchev | Trina Chatterjee | Venkata S Govindarajan | Jifan Chen | Eunsol Choi | Gabriella Chronis | Anubrata Das | Katrin Erk | Matthew Lease | Junyi Jessy Li | Yating Wu | Kyle Mahowald
Proceedings of the First Workshop on Dynamic Adversarial Data Collection

Developing methods to adversarially challenge NLP systems is a promising avenue for improving both model performance and interpretability. Here, we describe the approach of the team “longhorns” on Task 1 of the The First Workshop on Dynamic Adversarial Data Collection (DADC), which asked teams to manually fool a model on an Extractive Question Answering task. Our team finished first (pending validation), with a model error rate of 62%. We advocate for a systematic, linguistically informed approach to formulating adversarial questions, and we describe the results of our pilot experiments, as well as our official submission.

2021

pdf bib
Did they answer? Subjective acts and intents in conversational discourse
Elisa Ferracane | Greg Durrett | Junyi Jessy Li | Katrin Erk
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Discourse signals are often implicit, leaving it up to the interpreter to draw the required inferences. At the same time, discourse is embedded in a social context, meaning that interpreters apply their own assumptions and beliefs when resolving these inferences, leading to multiple, valid interpretations. However, current discourse data and frameworks ignore the social aspect, expecting only a single ground truth. We present the first discourse dataset with multiple and subjective interpretations of English conversation in the form of perceived conversation acts and intents. We carefully analyze our dataset and create computational models to (1) confirm our hypothesis that taking into account the bias of the interpreters leads to better predictions of the interpretations, (2) and show disagreements are nuanced and require a deeper understanding of the different contextual factors. We share our dataset and code at http://github.com/elisaF/subjective_discourse.

pdf bib
How to marry a star: Probabilistic constraints for meaning in context
Katrin Erk | Aurélie Herbelot
Proceedings of the Society for Computation in Linguistics 2021

pdf bib
“Politeness, you simpleton!” retorted [MASK]: Masked prediction of literary characters
Eric Holgate | Katrin Erk
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

What is the best way to learn embeddings for entities, and what can be learned from them? We consider this question for the case of literary characters. We address the highly challenging task of guessing, from a sentence in the novel, which character is being talked about, and we probe the embeddings to see what information they encode about their literary characters. We find that when continuously trained, entity embeddings do well at the masked entity prediction task, and that they encode considerable information about the traits and characteristics of the entities.

2020

pdf bib
Leveraging WordNet Paths for Neural Hypernym Prediction
Yejin Cho | Juan Diego Rodriguez | Yifan Gao | Katrin Erk
Proceedings of the 28th International Conference on Computational Linguistics

We formulate the problem of hypernym prediction as a sequence generation task, where the sequences are taxonomy paths in WordNet. Our experiments with encoder-decoder models show that training to generate taxonomy paths can improve the performance of direct hypernym prediction. As a simple but powerful model, the hypo2path model achieves state-of-the-art performance, outperforming the best benchmark by 4.11 points in hit-at-one (H@1).

pdf bib
When is a bishop not like a rook? When it’s like a rabbi! Multi-prototype BERT embeddings for estimating semantic relationships
Gabriella Chronis | Katrin Erk
Proceedings of the 24th Conference on Computational Natural Language Learning

This paper investigates contextual language models, which produce token representations, as a resource for lexical semantics at the word or type level. We construct multi-prototype word embeddings from bert-base-uncased (Devlin et al., 2018). These embeddings retain contextual knowledge that is critical for some type-level tasks, while being less cumbersome and less subject to outlier effects than exemplar models. Similarity and relatedness estimation, both type-level tasks, benefit from this contextual knowledge, indicating the context-sensitivity of these processes. BERT’s token level knowledge also allows the testing of a type-level hypothesis about lexical abstractness, demonstrating the relationship between token-level phenomena and type-level concreteness ratings. Our findings provide important insight into the interpretability of BERT: layer 7 approximates semantic similarity, while the final layer (11) approximates relatedness.

pdf bib
Help! Need Advice on Identifying Advice
Venkata Subrahmanyan Govindarajan | Benjamin Chen | Rebecca Warholic | Katrin Erk | Junyi Jessy Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Humans use language to accomplish a wide variety of tasks - asking for and giving advice being one of them. In online advice forums, advice is mixed in with non-advice, like emotional support, and is sometimes stated explicitly, sometimes implicitly. Understanding the language of advice would equip systems with a better grasp of language pragmatics; practically, the ability to identify advice would drastically increase the efficiency of advice-seeking online, as well as advice-giving in natural language generation systems. We present a dataset in English from two Reddit advice forums - r/AskParents and r/needadvice - annotated for whether sentences in posts contain advice or not. Our analysis reveals rich linguistic phenomena in advice discourse. We present preliminary models showing that while pre-trained language models are able to capture advice better than rule-based systems, advice identification is challenging, and we identify directions for future research.

2019

pdf bib
Evaluating Discourse in Structured Text Representations
Elisa Ferracane | Greg Durrett | Junyi Jessy Li | Katrin Erk
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Discourse structure is integral to understanding a text and is helpful in many NLP tasks. Learning latent representations of discourse is an attractive alternative to acquiring expensive labeled discourse data. Liu and Lapata (2018) propose a structured attention mechanism for text classification that derives a tree over a text, akin to an RST discourse tree. We examine this model in detail, and evaluate on additional discourse-relevant tasks and datasets, in order to assess whether the structured attention improves performance on the end task and whether it captures a text’s discourse structure. We find the learned latent trees have little to no structure and instead focus on lexical cues; even after obtaining more structured trees with proposed model modifications, the trees are still far from capturing discourse structure when compared to discourse dependency trees from an existing discourse parser. Finally, ablation studies show the structured attention provides little benefit, sometimes even hurting performance.

pdf bib
Query-focused Scenario Construction
Su Wang | Greg Durrett | Katrin Erk
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The news coverage of events often contains not one but multiple incompatible accounts of what happened. We develop a query-based system that extracts compatible sets of events (scenarios) from such data, formulated as one-class clustering. Our system incrementally evaluates each event’s compatibility with already selected events, taking order into account. We use synthetic data consisting of article mixtures for scalable training and evaluate our model on a new human-curated dataset of scenarios about real-world news topics. Stronger neural network models and harder synthetic training settings are both important to achieve high performance, and our final scenario construction system substantially outperforms baselines based on prior work.

pdf bib
From News to Medical: Cross-domain Discourse Segmentation
Elisa Ferracane | Titan Page | Junyi Jessy Li | Katrin Erk
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

The first step in discourse analysis involves dividing a text into segments. We annotate the first high-quality small-scale medical corpus in English with discourse segments and analyze how well news-trained segmenters perform on this domain. While we expectedly find a drop in performance, the nature of the segmentation errors suggests some problems can be addressed earlier in the pipeline, while others would require expanding the corpus to a trainable size to learn the nuances of the medical domain.

2018

pdf bib
Deep Neural Models of Semantic Shift
Alex Rosenfeld | Katrin Erk
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Diachronic distributional models track changes in word use over time. In this paper, we propose a deep neural network diachronic distributional model. Instead of modeling lexical change via a time series as is done in previous work, we represent time as a continuous variable and model a word’s usage as a function of time. Additionally, we have also created a novel synthetic task which measures a model’s ability to capture the semantic trajectory. This evaluation quantitatively measures how well a model captures the semantic trajectory of a word over time. Finally, we explore how well the derivatives of our model can be used to measure the speed of lexical change.

pdf bib
Implicit Argument Prediction with Event Knowledge
Pengxiang Cheng | Katrin Erk
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Implicit arguments are not syntactically connected to their predicates, and are therefore hard to extract. Previous work has used models with large numbers of features, evaluated on very small datasets. We propose to train models for implicit argument prediction on a simple cloze task, for which data can be generated automatically at scale. This allows us to use a neural model, which draws on narrative coherence and entity salience for predictions. We show that our model has superior performance on both synthetic and natural data.

pdf bib
Modeling Semantic Plausibility by Injecting World Knowledge
Su Wang | Greg Durrett | Katrin Erk
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Distributional data tells us that a man can swallow candy, but not that a man can swallow a paintball, since this is never attested. However both are physically plausible events. This paper introduces the task of semantic plausibility: recognizing plausible but possibly novel events. We present a new crowdsourced dataset of semantic plausibility judgments of single events such as man swallow paintball. Simple models based on distributional representations perform poorly on this task, despite doing well on selection preference, but injecting manually elicited knowledge about entity properties provides a substantial performance boost. Our error analysis shows that our new dataset is a great testbed for semantic plausibility models: more sophisticated knowledge representation and propagation could address many of the remaining errors.

pdf bib
Picking Apart Story Salads
Su Wang | Eric Holgate | Greg Durrett | Katrin Erk
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

During natural disasters and conflicts, information about what happened is often confusing and messy, and distributed across many sources. We would like to be able to automatically identify relevant information and assemble it into coherent narratives of what happened. To make this task accessible to neural models, we introduce Story Salads, mixtures of multiple documents that can be generated at scale. By exploiting the Wikipedia hierarchy, we can generate salads that exhibit challenging inference problems. Story salads give rise to a novel, challenging clustering task, where the objective is to group sentences from the same narratives. We demonstrate that simple bag-of-words similarity clustering falls short on this task, and that it is necessary to take into account global context and coherence.

2017

pdf bib
Distributional Modeling on a Diet: One-shot Word Learning from Text Only
Su Wang | Stephen Roller | Katrin Erk
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We test whether distributional models can do one-shot learning of definitional properties from text only. Using Bayesian models, we find that first learning overarching structure in the known data, regularities in textual contexts and in properties, helps one-shot learning, and that individual context items can be highly informative.

2016

pdf bib
Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment
Stephen Roller | Katrin Erk
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
PIC a Different Word: A Simple Model for Lexical Substitution in Context
Stephen Roller | Katrin Erk
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Leveraging coreference to identify arms in medical abstracts: An experimental study
Elisa Ferracane | Iain Marshall | Byron C. Wallace | Katrin Erk
Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis

pdf bib
Word Sense Clustering and Clusterability
Diana McCarthy | Marianna Apidianaki | Katrin Erk
Computational Linguistics, Volume 42, Issue 2 - June 2016

pdf bib
Representing Meaning with a Combination of Logical and Distributional Models
I. Beltagy | Stephen Roller | Pengxiang Cheng | Katrin Erk | Raymond J. Mooney
Computational Linguistics, Volume 42, Issue 4 - December 2016

pdf bib
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Katrin Erk | Noah A. Smith
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Katrin Erk | Noah A. Smith
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
On the Proper Treatment of Quantifiers in Probabilistic Logic Semantics
Islam Beltagy | Katrin Erk
Proceedings of the 11th International Conference on Computational Semantics

2014

pdf bib
UTexas: Natural Language Semantics using Distributional Semantics and Probabilistic Logic
Islam Beltagy | Stephen Roller | Gemma Boleda | Katrin Erk | Raymond Mooney
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Semantic Parsing using Distributional Semantics and Probabilistic Logic
Islam Beltagy | Katrin Erk | Raymond Mooney
Proceedings of the ACL 2014 Workshop on Semantic Parsing

pdf bib
Who Evoked that Frame? Some Thoughts on Context Effects and Event Types
Katrin Erk
Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014)

pdf bib
Probabilistic Soft Logic for Semantic Textual Similarity
Islam Beltagy | Katrin Erk | Raymond Mooney
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Inclusive yet Selective: Supervised Distributional Hypernymy Detection
Stephen Roller | Katrin Erk | Gemma Boleda
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
What Substitutes Tell Us - Analysis of an “All-Words” Lexical Substitution Corpus
Gerhard Kremer | Katrin Erk | Sebastian Padó | Stefan Thater
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Measuring Word Meaning in Context
Katrin Erk | Diana McCarthy | Nicholas Gaylord
Computational Linguistics, Volume 39, Issue 3 - September 2013

pdf bib
NAACL HLT 2013 Tutorial Abstracts
Jimmy Lin | Katrin Erk
NAACL HLT 2013 Tutorial Abstracts

bib
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers
Alexander Koller | Katrin Erk
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

pdf bib
Towards a semantics for distributional representations
Katrin Erk
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

bib
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Short Papers
Alexander Koller | Katrin Erk
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Short Papers

pdf bib
Montague Meets Markov: Deep Semantics with Probabilistic Logical Form
Islam Beltagy | Cuong Chau | Gemma Boleda | Dan Garrette | Katrin Erk | Raymond Mooney
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

2011

pdf bib
Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
Elias Ponvert | Jason Baldridge | Katrin Erk
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Integrating Logical Representations with Probabilistic Information using Markov Logic
Dan Garrette | Katrin Erk | Raymond Mooney
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

2010

pdf bib
Exemplar-Based Models for Word Meaning in Context
Katrin Erk | Sebastian Padó
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation
Taesun Moon | Katrin Erk | Jason Baldridge
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the 5th International Workshop on Semantic Evaluation
Katrin Erk | Carlo Strapparava
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
What Is Word Meaning, Really? (And How Can Distributional Models Help Us Describe It?)
Katrin Erk
Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics

pdf bib
A Flexible, Corpus-Driven Model of Regular and Inverse Selectional Preferences
Katrin Erk | Sebastian Padó | Ulrike Padó
Computational Linguistics, Volume 36, Issue 4 - December 2010

2009

pdf bib
Graded Word Sense Assignment
Katrin Erk | Diana McCarthy
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Unsupervised morphological segmentation and clustering with document boundaries
Taesun Moon | Katrin Erk | Jason Baldridge
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Paraphrase Assessment in Structured Vector Space: Exploring Parameters and Datasets
Katrin Erk | Sebastian Padó
Proceedings of the Workshop on Geometrical Models of Natural Language Semantics

pdf bib
Representing words as regions in vector space
Katrin Erk
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf bib
Measuring semantic relatedness with vector space models and random walks
Amaç Herdağdelen | Katrin Erk | Marco Baroni
Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (TextGraphs-4)

pdf bib
Supporting inferences in semantic space: representing words as regions
Katrin Erk
Proceedings of the Eight International Conference on Computational Semantics

pdf bib
Investigations on Word Senses and Word Usages
Katrin Erk | Diana McCarthy | Nicholas Gaylord
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Teaching Computational Linguistics to a Large, Diverse Student Body: Courses, Tools, and Interdepartmental Interaction
Jason Baldridge | Katrin Erk
Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics

pdf bib
A Structured Vector Space Model for Word Meaning in Context
Katrin Erk | Sebastian Padó
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
SemEval-2007 Task 19: Frame Semantic Structure Extraction
Collin Baker | Michael Ellsworth | Katrin Erk
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
A Simple, Similarity-based Model for Selectional Preferences
Katrin Erk
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
IGT-XML: An XML Format for Interlinearized Glossed Text
Alexis Palmer | Katrin Erk
Proceedings of the Linguistic Annotation Workshop

pdf bib
Flexible, Corpus-Based Modelling of Human Plausibility Judgements
Sebastian Padó | Ulrike Padó | Katrin Erk
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
The SALSA Corpus: a German Corpus Resource for Lexical Semantics
Aljoscha Burchardt | Katrin Erk | Anette Frank | Andrea Kowalski | Sebastian Padó | Manfred Pinkal
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the SALSA corpus, a large German corpus manually annotated with manual role-semantic annotation, based on the syntactically annotated TIGER newspaper corpus. The first release, comprising about 20,000 annotated predicate instances (about half the TIGER corpus), is scheduled for mid-2006. In this paper we discuss the annotation framework (frame semantics) and its cross-lingual applicability, problems arising from exhaustive annotation, strategies for quality control, and possible applications.

pdf bib
SALTO - A Versatile Multi-Level Annotation Tool
Aljoscha Burchardt | Katrin Erk | Anette Frank | Andrea Kowalski | Sebastian Pado
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we describe the SALTO tool. It was originally developed for the annotation of semantic roles in the frame semantics paradigm, but can be used for graphical annotation of treebanks with general relational information in a simple drag-and-drop fashion. The tool additionally supports corpus management and quality control.

pdf bib
Shalmaneser - A Toolchain For Shallow Semantic Parsing
Katrin Erk | Sebastian Padó
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents Shalmaneser, a software package for shallow semantic parsing, the automatic assignment of semantic classes and roles to free text. Shalmaneser is a toolchain of independent modules communicating through a common XML format. System output can be inspected graphically. Shalmaneser can be used either as a “black box” to obtain semantic parses for new datasets (classifiers for English and German frame-semantic analysis are included), or as a research platform that can be extended to new parsers, languages, or classification paradigms.

pdf bib
Unknown word sense detection as outlier detection
Katrin Erk
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf bib
Analyzing Models for Semantic Role Assignment using Confusability
Katrin Erk | Sebastian Padó
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Semantic role labelling with similarity-based generalization using EM-based clustering
Ulrike Baldewein | Katrin Erk | Sebastian Padó | Detlef Prescher
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Semantic Role Labelling With Chunk Sequences
Ulrike Baldewein | Katrin Erk | Sebastian Padó | Detlef Prescher
Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004

pdf bib
A Powerful and Versatile XML Format for Representing Role-semantic Annotation
Katrin Erk | Sebastian Padó
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Querying Both Time-aligned and Hierarchical Corpora with NXT Search
Ulrich Heid | Holger Voormann | Jan-Torsten Milde | Ulrike Gut | Katrin Erk | Sebastian Padó
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Towards a Resource for Lexical Semantics: A Large German Corpus with Extensive Semantic Annotation
Katrin Erk | Andrea Kowalski | Sebastian Padó | Manfred Pinkal
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Well-Nested Parallelism Constraints for Ellipsis Resolution
Katrin Erk | Joachim Niehren
10th Conference of the European Chapter of the Association for Computational Linguistics

2001

pdf bib
Underspecified Beta Reduction
Manuel Bodirsky | Katrin Erk | Alexander Koller | Joachim Niehren
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics