Marie-Catherine de Marneffe

Also published as: Marie Catherine de Marneffe, Marie-Catherine De marneffe, Marie-Catherine De Marneffe

2024

pdf bib
Sensibilité des explications à l’aléa des grands modèles de langage : le cas de la classification de textes journalistiques [Sensitivity of Explanations to the Randomness of Large Language Models: a Case Study on Journalistic Text Classification]
Jérémie Bogaert | Marie-Catherine de Marneffe | Antonin Descampe | Louis Escouflaire | Cédrick Fairon | François-Xavier Standaert
Traitement Automatique des Langues, Volume 64, Numéro 3 : Explicabilité des modèles de TAL [Explainability of NLP models]

pdf bib abs
VariErr NLI: Separating Annotation Error from Human Label Variation
Leon Weber-Genzel | Siyao Peng | Marie-Catherine De Marneffe | Barbara Plank
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Human label variation arises when annotators assign different labels to the same item for valid reasons, while annotation errors occur when labels are assigned for invalid reasons. These two issues are prevalent in NLP benchmarks, yet existing research has studied them in isolation. To the best of our knowledge, there exists no prior work that focuses on teasing apart error from signal, especially in cases where signal is beyond black-and-white.To fill this gap, we introduce a systematic methodology and a new dataset, VariErr (variation versus error), focusing on the NLI task in English. We propose a 2-round annotation procedure with annotators explaining each label and subsequently judging the validity of label-explanation pairs.VariErr contains 7,732 validity judgments on 1,933 explanations for 500 re-annotated MNLI items. We assess the effectiveness of various automatic error detection (AED) methods and GPTs in uncovering errors versus human label variation. We find that state-of-the-art AED methods significantly underperform GPTs and humans. While GPT-4 is the best system, it still falls short of human performance. Our methodology is applicable beyond NLI, offering fertile ground for future research on error versus plausible variation, which in turn can yield better and more trustworthy NLP systems.

pdf bib abs
Insights of a Usability Study for KBQA Interactive Semantic Parsing: Generation Yields Benefits over Templates but External Validity Remains Challenging
Ashley Lewis | Lingbo Mo | Marie-Catherine de Marneffe | Huan Sun | Michael White
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024

We present our findings from a usability study of an interactive semantic parsing system for knowledge based question answering (KBQA). The system is designed to help users access information within a knowledge base without having to know its query language. The system translates the user’s question into the query language, retrieves an answer, then presents an English explanation of the process so that the user can make corrections if necessary. To our knowledge, our work is the most thorough usability study conducted for such a system and the only one that uses crowdworkers as participants to verify that the system is usable for average users. Our crowdworkers participate in KBQA dialogues using 4 versions of a system based on the framework by Mo et al. (2022) and answer surveys about their experiences. Some key takeaways from this work are: 1) we provide evidence for the benefits of interactivity in semantic parsing with human users and using generated questions in lieu of templated representations, 2) we identify limitations of simulations and provide contrasting evidence from actual system use, and 3) we provide an examination of crowdsourcing methodology, in particular the trade-offs of using crowdworkers vs. a specially trained group of evaluators.

This paper presents the objectives, organization and activities of the UniDive COST Action, a scientific network dedicated to universality, diversity and idiosyncrasy in language technology. We describe the objectives and organization of this initiative, the people involved, the working groups and the ongoing tasks and activities. This paper is also an pen call for participation towards new members and countries.

2023

pdf bib abs
Ecologically Valid Explanations for Label Variation in NLI
Nan-Jiang Jiang | Chenhao Tan | Marie-Catherine de Marneffe
Findings of the Association for Computational Linguistics: EMNLP 2023

Human label variation, or annotation disagreement, exists in many natural language processing (NLP) tasks, including natural language inference (NLI). To gain direct evidence of how NLI label variation arises, we build LiveNLI, an English dataset of 1,415 ecologically valid explanations (annotators explain the NLI labels they chose) for 122 MNLI items (at least 10 explanations per item). The LiveNLI explanations confirm that people can systematically vary on their interpretation and highlight within-label variation: annotators sometimes choose the same label for different reasons. This suggests that explanations are crucial for navigating label interpretations in general. We few-shot prompt large language models to generate explanations but the results are inconsistent: they sometimes produces valid and informative explanations, but it also generates implausible ones that do not support the label, highlighting directions for improvement.

2022

pdf bib
Findings of the Association for Computational Linguistics: NAACL 2022
Marine Carpuat | Marie-Catherine de Marneffe | Ivan Vladimir Meza Ruiz
Findings of the Association for Computational Linguistics: NAACL 2022

pdf bib
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Marine Carpuat | Marie-Catherine de Marneffe | Ivan Vladimir Meza Ruiz
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib abs
Investigating Reasons for Disagreement in Natural Language Inference
Nan-Jiang Jiang | Marie-Catherine de Marneffe
Transactions of the Association for Computational Linguistics, Volume 10

We investigate how disagreement in natural language inference (NLI) annotation arises. We developed a taxonomy of disagreement sources with 10 categories spanning 3 high- level classes. We found that some disagreements are due to uncertainty in the sentence meaning, others to annotator biases and task artifacts, leading to different interpretations of the label distribution. We explore two modeling approaches for detecting items with potential disagreement: a 4-way classification with a “Complicated” label in addition to the three standard NLI labels, and a multilabel classification approach. We found that the multilabel classification is more expressive and gives better recall of the possible interpretations in the data.

pdf bib abs
CENTAL at TSAR-2022 Shared Task: How Does Context Impact BERT-Generated Substitutions for Lexical Simplification?
Rodrigo Wilkens | David Alfter | Rémi Cardon | Isabelle Gribomont | Adrien Bibal | Watrin Patrick | Marie-Catherine De marneffe | Thomas François
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

Lexical simplification is the task of substituting a difficult word with a simpler equivalent for a target audience. This is currently commonly done by modeling lexical complexity on a continuous scale to identify simpler alternatives to difficult words. In the TSAR shared task, the organizers call for systems capable of generating substitutions in a zero-shot-task context, for English, Spanish and Portuguese. In this paper, we present the solution we (the cental team) proposed for the task. We explore the ability of BERT-like models to generate substitution words by masking the difficult word. To do so, we investigate various context enhancement strategies, that we combined into an ensemble method. We also explore different substitution ranking methods. We report on a post-submission analysis of the results and present our insights for potential improvements. The code for all our experiments is available at https://gitlab.com/Cental-FR/cental-tsar2022.

2021

pdf bib abs
Universal Dependencies
Marie-Catherine de Marneffe | Christopher D. Manning | Joakim Nivre | Daniel Zeman
Computational Linguistics, Volume 47, Issue 2 - June 2021

Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for crosslinguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.

pdf bib abs
Identifying inherent disagreement in natural language inference
Xinliang Frederick Zhang | Marie-Catherine de Marneffe
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Natural language inference (NLI) is the task of determining whether a piece of text is entailed, contradicted by or unrelated to another piece of text. In this paper, we investigate how to tease systematic inferences (i.e., items for which people agree on the NLI label) apart from disagreement items (i.e., items which lead to different annotations), which most prior work has overlooked. To distinguish systematic inferences from disagreement items, we propose Artificial Annotators (AAs) to simulate the uncertainty in the annotation process by capturing the modes in annotations. Results on the CommitmentBank, a corpus of naturally occurring discourses in English, confirm that our approach performs statistically significantly better than all baselines. We further show that AAs learn linguistic patterns and context-dependent reasoning.

pdf bib abs
He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics
Nanjiang Jiang | Marie-Catherine de Marneffe
Transactions of the Association for Computational Linguistics, Volume 9

We investigate how well BERT performs on predicting factuality in several existing English datasets, encompassing various linguistic constructions. Although BERT obtains a strong performance on most datasets, it does so by exploiting common surface patterns that correlate with certain factuality labels, and it fails on instances where pragmatic reasoning is necessary. Contrary to what the high performance suggests, we are still far from having a robust system for factuality prediction.

2020

pdf bib abs
Contextualized Embeddings for Enriching Linguistic Analyses on Politeness
Ahmad Aljanaideh | Eric Fosler-Lussier | Marie-Catherine de Marneffe
Proceedings of the 28th International Conference on Computational Linguistics

Linguistic analyses in natural language processing (NLP) have often been performed around the static notion of words where the context (surrounding words) is not considered. For example, previous analyses on politeness have focused on comparing the use of static words such as personal pronouns across (im)polite requests without taking the context of those words into account. Current word embeddings in NLP do capture context and thus can be leveraged to enrich linguistic analyses. In this work, we introduce a model which leverages the pre-trained BERT model to cluster contextualized representations of a word based on (1) the context in which the word appears and (2) the labels of items the word occurs in. Using politeness as case study, this model is able to automatically discover interpretable, fine-grained context patterns of words, some of which align with existing theories on politeness. Our model further discovers novel finer-grained patterns associated with (im)polite language. For example, the word please can occur in impolite contexts that are predictable from BERT clustering. The approach proposed here is validated by showing that features based on fine-grained patterns inferred from the clustering improve over politeness-word baselines.

Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on syntactic relations between predicates, arguments and modifiers. In this paper, we describe version 2 of the universal guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages.

pdf bib
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)
Marie-Catherine de Marneffe | Miryam de Lhoneux | Joakim Nivre | Sebastian Schuster
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

2019

pdf bib
Conversion et améliorations de corpus du français annotés en Universal Dependencies [Conversion and Improvement of Universal Dependencies French corpora]
Bruno Guillaume | Marie-Catherine de Marneffe | Guy Perrier
Traitement Automatique des Langues, Volume 60, Numéro 2 : Corpus annotés [Annotated corpora]

pdf bib abs
Evaluating BERT for natural language inference: A case study on the CommitmentBank
Nanjiang Jiang | Marie-Catherine de Marneffe
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Natural language inference (NLI) datasets (e.g., MultiNLI) were collected by soliciting hypotheses for a given premise from annotators. Such data collection led to annotation artifacts: systems can identify the premise-hypothesis relationship without observing the premise (e.g., negation in hypothesis being indicative of contradiction). We address this problem by recasting the CommitmentBank for NLI, which contains items involving reasoning over the extent to which a speaker is committed to complements of clause-embedding verbs under entailment-canceling environments (conditional, negation, modal and question). Instead of being constructed to stand in certain relationships with the premise, hypotheses in the recast CommitmentBank are the complements of the clause-embedding verb in each premise, leading to no annotation artifacts in the hypothesis. A state-of-the-art BERT-based model performs well on the CommitmentBank with 85% F1. However analysis of model behavior shows that the BERT models still do not capture the full complexity of pragmatic reasoning, nor encode some of the linguistic generalizations, highlighting room for improvement.

pdf bib abs
Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities
Alexander Erdmann | David Joseph Wrisley | Benjamin Allen | Christopher Brown | Sophie Cohen-Bodénès | Micha Elsner | Yukun Feng | Brian Joseph | Béatrice Joyeux-Prunel | Marie-Catherine de Marneffe
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Scholars in inter-disciplinary fields like the Digital Humanities are increasingly interested in semantic annotation of specialized corpora. Yet, under-resourced languages, imperfect or noisily structured data, and user-specific classification tasks make it difficult to meet their needs using off-the-shelf models. Manual annotation of large corpora from scratch, meanwhile, can be prohibitively expensive. Thus, we propose an active learning solution for named entity recognition, attempting to maximize a custom model’s improvement per additional unit of manual annotation. Our system robustly handles any domain or user-defined label set and requires no external resources, enabling quality named entity recognition for Humanities corpora where such resources are not available. Evaluating on typologically disparate languages and datasets, we reduce required annotation by 20-60% and greatly outperform a competitive active learning baseline.

pdf bib abs
Do You Know That Florence Is Packed with Visitors? Evaluating State-of-the-art Models of Speaker Commitment
Nanjiang Jiang | Marie-Catherine de Marneffe
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

When a speaker, Mary, asks “Do you know that Florence is packed with visitors?”, we take her to believe that Florence is packed with visitors, but not if she asks “Do you think that Florence is packed with visitors?”. Inferring speaker commitment (aka event factuality) is crucial for information extraction and question answering. Here, we explore the hypothesis that linguistic deficits drive the error patterns of existing speaker commitment models by analyzing the linguistic correlates of model error on a challenging naturalistic dataset. We evaluate two state-of-the-art speaker commitment models on the CommitmentBank, an English dataset of naturally occurring discourses. The CommitmentBank is annotated with speaker commitment towards the content of the complement (“Florence is packed with visitors” in our example) of clause-embedding verbs (“know”, “think”) under four entailment-canceling environments (negation, modal, question, conditional). A breakdown of items by linguistic features reveals asymmetrical error patterns: while the models achieve good performance on some classes (e.g., negation), they fail to generalize to the diverse linguistic constructions (e.g., conditionals) in natural language, highlighting directions for improvement.

2018

pdf bib abs
QED: A fact verification system for the FEVER shared task
Jackson Luken | Nanjiang Jiang | Marie-Catherine de Marneffe
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

This paper describes our system submission to the 2018 Fact Extraction and VERification (FEVER) shared task. The system uses a heuristics-based approach for evidence extraction and a modified version of the inference model by Parikh et al. (2016) for classification. Our process is broken down into three modules: potentially relevant documents are gathered based on key phrases in the claim, then any possible evidence sentences inside those documents are extracted, and finally our classifier discards any evidence deemed irrelevant and uses the remaining to classify the claim’s veracity. Our system beats the shared task baseline by 12% and is successful at finding correct evidence (evidence retrieval F1 of 62.5% on the development set).

pdf bib
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
Marie-Catherine de Marneffe | Teresa Lynn | Sebastian Schuster
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

2017

pdf bib abs
“i have a feeling trump will win..................”: Forecasting Winners and Losers from User Predictions on Twitter
Sandesh Swamy | Alan Ritter | Marie-Catherine de Marneffe
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Social media users often make explicit predictions about upcoming events. Such statements vary in the degree of certainty the author expresses toward the outcome: “Leonardo DiCaprio will win Best Actor” vs. “Leonardo DiCaprio may win” or “No way Leonardo wins!”. Can popular beliefs on social media predict who will win? To answer this question, we build a corpus of tweets annotated for veridicality on which we train a log-linear classifier that detects positive veridicality with high precision. We then forecast uncertain outcomes using the wisdom of crowds, by aggregating users’ explicit predictions. Our method for forecasting winners is fully automated, relying only on a set of contenders as input. It requires no training data of past outcomes and outperforms sentiment and tweet volume baselines on a broad range of contest prediction tasks. We further demonstrate how our approach can be used to measure the reliability of individual accounts’ predictions and retrospectively identify surprise outcomes.

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf bib
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)
Marie-Catherine de Marneffe | Joakim Nivre | Sebastian Schuster
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)

pdf bib abs
Breaking NLP: Using Morphosyntax, Semantics, Pragmatics and World Knowledge to Fool Sentiment Analysis Systems
Taylor Mahler | Willy Cheung | Micha Elsner | David King | Marie-Catherine de Marneffe | Cory Shain | Symon Stevens-Guille | Michael White
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

This paper describes our “breaker” submission to the 2017 EMNLP “Build It Break It” shared task on sentiment analysis. In order to cause the “builder” systems to make incorrect predictions, we edited items in the blind test data according to linguistically interpretable strategies that allow us to assess the ease with which the builder systems learn various components of linguistic structure. On the whole, our submitted pairs break all systems at a high rate (72.6%), indicating that sentiment analysis as an NLP task may still have a lot of ground to cover. Of the breaker strategies that we consider, we find our semantic and pragmatic manipulations to pose the most substantial difficulties for the builder systems.

pdf bib
Assessing the Annotation Consistency of the Universal Dependencies Corpora
Marie-Catherine de Marneffe | Matias Grioni | Jenna Kanerva | Filip Ginter
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2016

Cross-linguistically consistent annotation is necessary for sound comparative evaluation and cross-lingual learning experiments. It is also useful for multilingual system development and comparative linguistic studies. Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. In this paper, we describe v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages.

pdf bib
Adjusting Word Embeddings with Semantic Intensity Orders
Joo-Kyung Kim | Marie-Catherine de Marneffe | Eric Fosler-Lussier
Proceedings of the 1st Workshop on Representation Learning for NLP

pdf bib
Identification, characterization, and grounding of gradable terms in clinical text
Chaitanya Shivade | Marie-Catherine de Marneffe | Eric Fosler-Lussier | Albert M. Lai
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

pdf bib abs
Results of the WNUT16 Named Entity Recognition Shared Task
Benjamin Strauss | Bethany Toma | Alan Ritter | Marie-Catherine de Marneffe | Wei Xu
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

This paper presents the results of the Twitter Named Entity Recognition shared task associated with W-NUT 2016: a named entity tagging task with 10 teams participating. We outline the shared task, annotation process and dataset statistics, and provide a high-level overview of the participating systems for each shared task.

Although spanning thousands of years and genres as diverse as liturgy, historiography, lyric and other forms of prose and poetry, the body of Latin texts is still relatively sparse compared to English. Data sparsity in Latin presents a number of challenges for traditional Named Entity Recognition techniques. Solving such challenges and enabling reliable Named Entity Recognition in Latin texts can facilitate many down-stream applications, from machine translation to digital historiography, enabling Classicists, historians, and archaeologists for instance, to track the relationships of historical persons, places, and groups on a large scale. This paper presents the first annotated corpus for evaluating Named Entity Recognition in Latin, as well as a fully supervised model that achieves over 90% F-score on a held-out test set, significantly outperforming a competitive baseline. We also present a novel active learning strategy that predicts how many and which sentences need to be annotated for named entities in order to attain a specified degree of accuracy when recognizing named entities automatically in a given text. This maximizes the productivity of annotators while simultaneously controlling quality.

2015

pdf bib
The Overall Markedness of Discourse Relations
Lifeng Jin | Marie-Catherine de Marneffe
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Corpus-based discovery of semantic intensity scales
Chaitanya Shivade | Marie-Catherine de Marneffe | Eric Fosler-Lussier | Albert M. Lai
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
I do not disagree: leveraging monolingual alignment to detect disagreement in dialogue
Ajda Gokcen | Marie-Catherine de Marneffe
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Extending NegEx with Kernel Methods for Negation Detection in Clinical Text
Chaitanya Shivade | Marie-Catherine de Marneffe | Eric Fosler-Lussier | Albert M. Lai
Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015)

pdf bib
Neural word embeddings with multiplicative feature interactions for tensor-based compositions
Joo-Kyung Kim | Marie-Catherine de Marneffe | Eric Fosler-Lussier
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

pdf bib
Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition
Timothy Baldwin | Marie Catherine de Marneffe | Bo Han | Young-Bum Kim | Alan Ritter | Wei Xu
Proceedings of the Workshop on Noisy User-generated Text

2014

pdf bib abs
Universal Stanford dependencies: A cross-linguistic typology
Marie-Catherine de Marneffe | Timothy Dozat | Natalia Silveira | Katri Haverinen | Filip Ginter | Joakim Nivre | Christopher D. Manning
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Revisiting the now de facto standard Stanford dependency representation, we propose an improved taxonomy to capture grammatical relations across languages, including morphologically rich ones. We suggest a two-layered taxonomy: a set of broadly attested universal grammatical relations, to which language-specific relations can be added. We emphasize the lexicalist stance of the Stanford Dependencies, which leads to a particular, partially new treatment of compounding, prepositions, and morphology. We show how existing dependency schemes for several languages map onto the universal taxonomy proposed here and close with consideration of practical implications of dependency representation choices for NLP applications, in particular parsing.

We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies formalism. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for English informal text genres. We also present experiments on the use of this resource, both for training dependency parsers and for evaluating the quality of different versions of the Stanford Parser, which includes a converter tool to produce dependency annotation from constituency trees. We show that training a dependency parser on a mix of newswire and web data leads to better performance on that type of data without hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be a valuable resource for parsing. Furthermore, the systematic annotation effort has informed both the SD formalism and its implementation in the Stanford Parser’s dependency converter. In response to the challenges encountered by annotators in the EWT corpus, the formalism has been revised and extended, and the converter has been improved.

We investigate a number of approaches to generating Stanford Dependencies, a widely used semantically-oriented dependency representation. We examine algorithms specifically designed for dependency parsing (Nivre, Nivre Eager, Covington, Eisner, and RelEx) as well as dependencies extracted from constituent parse trees created by phrase structure parsers (Charniak, Charniak-Johnson, Bikel, Berkeley and Stanford). We found that constituent parsers systematically outperform algorithms designed specifically for dependency parsing. The most accurate method for generating dependencies is the Charniak-Johnson reranking parser, with 89% (labeled) attachment F1 score. The fastest methods are Nivre, Nivre Eager, and Covington, used with a linear classifier to make local parsing decisions, which can parse the entire Penn Treebank development set (section 22) in less than 10 seconds on an Intel Xeon E5520. However, this speed comes with a substantial drop in F1 score (about 76% for labeled attachment) compared to competing methods. By tuning how much of the search space is explored by the Charniak-Johnson parser, we are able to arrive at a balanced configuration that is both fast and nearly as good as the most accurate approaches.

pdf bib
“Was It Good? It Was Provocative.” Learning the Meaning of Scalar Adjectives
Marie-Catherine de Marneffe | Christopher D. Manning | Christopher Potts
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf bib
Multi-word expressions in textual inference: Much ado about nothing?
Marie-Catherine de Marneffe | Sebastian Padó | Christopher D. Manning
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)

pdf bib
Not a Simple Yes or No: Uncertainty in Indirect Answers
Marie-Catherine de Marneffe | Scott Grimm | Christopher Potts
Proceedings of the SIGDIAL 2009 Conference

2008

pdf bib
Finding Contradictions in Text
Marie-Catherine de Marneffe | Anna N. Rafferty | Christopher D. Manning
Proceedings of ACL-08: HLT

pdf bib
The Stanford Typed Dependencies Representation
Marie-Catherine de Marneffe | Christopher D. Manning
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation

2007

2006

pdf bib abs
Generating Typed Dependency Parses from Phrase Structure Parses
Marie-Catherine de Marneffe | Bill MacCartney | Christopher D. Manning
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes a system for extracting typed dependency parses of English sentences from phrase structure parses. In order to capture inherent relations occurring in corpus texts that can be critical in real-world applications, many NP relations are included in the set of grammatical relations used. We provide a comparison of our system with Minipar and the Link parser. The typed dependency extraction facility described here is integrated in the Stanford Parser, available for download.

pdf bib
Learning to recognize features of valid textual entailments
Bill MacCartney | Trond Grenager | Marie-Catherine de Marneffe | Daniel Cer | Christopher D. Manning
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference