Michael Collins

Also published as: Michael John Collins, Mike Collins


2024

pdf bib
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Alon Jacovi | Yonatan Bitton | Bernd Bohnet | Jonathan Herzig | Or Honovich | Michael Tseng | Michael Collins | Roee Aharoni | Mor Geva
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Prompting language models to provide step-by-step answers (e.g., “Chain-of-Thought”) is the prominent approach for complex reasoning tasks, where more accurate reasoning chains typically improve downstream task performance. Recent literature discusses automatic methods to verify reasoning to evaluate and improve their correctness. However, no fine-grained step-level datasets are available to enable thorough evaluation of such verification methods, hindering progress in this direction. We introduce REVEAL: Reasoning Verification Evaluation, a dataset to benchmark automatic verifiers of complex Chain-of-Thought reasoning in open-domain question-answering settings. REVEAL includes comprehensive labels for the relevance, attribution to evidence passages, and logical correctness of each reasoning step in a language model’s answer, across a variety of datasets and state-of-the-art language models. Evaluation on REVEAL shows that verifiers struggle at verifying reasoning chains - in particular, verifying logical correctness and detecting contradictions. Available at https://reveal-dataset.github.io/ .

2023

pdf bib
Query Refinement Prompts for Closed-Book Long-Form QA
Reinald Kim Amplayo | Kellie Webster | Michael Collins | Dipanjan Das | Shashi Narayan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models (LLMs) have been shown to perform well in answering questions and in producing long-form texts, both in few-shot closed-book settings. While the former can be validated using well-known evaluation metrics, the latter is difficult to evaluate. We resolve the difficulties to evaluate long-form output by doing both tasks at once – to do question answering that requires long-form answers. Such questions tend to be multifaceted, i.e., they may have ambiguities and/or require information from multiple sources. To this end, we define query refinement prompts that encourage LLMs to explicitly express the multifacetedness in questions and generate long-form answers covering multiple facets of the question. Our experiments on two long-form question answering datasets, ASQA and AQuAMuSe, show that using our prompts allows us to outperform fully finetuned models in the closed book setting, as well as achieve results comparable to retrieve-then-generate open-book models.

pdf bib
Measuring Attribution in Natural Language Generation Models
Hannah Rashkin | Vitaly Nikolaev | Matthew Lamm | Lora Aroyo | Michael Collins | Dipanjan Das | Slav Petrov | Gaurav Singh Tomar | Iulia Turc | David Reitter
Computational Linguistics, Volume 49, Issue 4 - December 2023

Large neural models have brought a new challenge to natural language generation (NLG): It has become imperative to ensure the safety and reliability of the output of models that generate freely. To this end, we present an evaluation framework, Attributable to Identified Sources (AIS), stipulating that NLG output pertaining to the external world is to be verified against an independent, provided source. We define AIS and a two-stage annotation pipeline for allowing annotators to evaluate model output according to annotation guidelines. We successfully validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset). We provide full annotation guidelines in the appendices and publicly release the annotated data at https://github.com/google-research-datasets/AIS.

pdf bib
Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization
Thomas Effland | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 11

We present Expected Statistic Regulariza tion (ESR), a novel regularization technique that utilizes low-order multi-task structural statistics to shape model distributions for semi- supervised learning on low-resource datasets. We study ESR in the context of cross-lingual transfer for syntactic analysis (POS tagging and labeled dependency parsing) and present several classes of low-order statistic functions that bear on model behavior. Experimentally, we evaluate the proposed statistics with ESR for unsupervised transfer on 5 diverse target languages and show that all statistics, when estimated accurately, yield improvements to both POS and LAS, with the best statistic improving POS by +7.0 and LAS by +8.5 on average. We also present semi-supervised transfer and learning curve experiments that show ESR provides significant gains over strong cross-lingual-transfer-plus-fine-tuning baselines for modest amounts of label data. These results indicate that ESR is a promising and complementary approach to model-transfer approaches for cross-lingual parsing.1

pdf bib
Coreference Resolution through a seq2seq Transition-Based System
Bernd Bohnet | Chris Alberti | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 11

Most recent coreference resolution systems use search algorithms over possible spans to identify mentions and resolve coreference. We instead present a coreference resolution system that uses a text-to-text (seq2seq) paradigm to predict mentions and links jointly. We implement the coreference system as a transition system and use multilingual T5 as an underlying language model. We obtain state-of-the-art accuracy on the CoNLL-2012 datasets with 83.3 F1-score for English (a 2.3 higher F1-score than previous work [Dobrovolskii, 2021]) using only CoNLL data for training, 68.5 F1-score for Arabic (+4.1 higher than previous work), and 74.3 F1-score for Chinese (+5.3). In addition we use the SemEval-2010 data sets for experiments in the zero-shot setting, a few-shot setting, and supervised setting using all available training data. We obtain substantially higher zero-shot F1-scores for 3 out of 4 languages than previous approaches and significantly exceed previous supervised state-of-the-art results for all five tested languages. We provide the code and models as open source.1

2022

pdf bib
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation
Shashi Narayan | Gonçalo Simões | Yao Zhao | Joshua Maynez | Dipanjan Das | Michael Collins | Mirella Lapata
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose Composition Sampling, a simple but effective method to generate diverse outputs for conditional generation of higher quality compared to previous stochastic decoding strategies. It builds on recently proposed plan-based neural generation models (FROST, Narayan et al, 2021) that are trained to first create a composition of the output and then generate by conditioning on it and the input. Our approach avoids text degeneration by first sampling a composition in the form of an entity chain and then using beam search to generate the best possible text grounded to this entity chain. Experiments on summarization (CNN/DailyMail and XSum) and question generation (SQuAD), using existing and newly proposed automaticmetrics together with human-based evaluation, demonstrate that Composition Sampling is currently the best available decoding strategy for generating diverse meaningful outputs.

pdf bib
Evaluating Explanations: How Much Do Explanations from the Teacher Aid Students?
Danish Pruthi | Rachit Bansal | Bhuwan Dhingra | Livio Baldini Soares | Michael Collins | Zachary C. Lipton | Graham Neubig | William W. Cohen
Transactions of the Association for Computational Linguistics, Volume 10

While many methods purport to explain predictions by highlighting salient features, what aims these explanations serve and how they ought to be evaluated often go unstated. In this work, we introduce a framework to quantify the value of explanations via the accuracy gains that they confer on a student model trained to simulate a teacher model. Crucially, the explanations are available to the student during training, but are not available at test time. Compared with prior proposals, our approach is less easily gamed, enabling principled, automatic, model-agnostic evaluation of attributions. Using our framework, we compare numerous attribution methods for text classification and question answering, and observe quantitative differences that are consistent (to a moderate to high degree) across different student model architectures and learning strategies.1

2021

pdf bib
Sparse, Dense, and Attentional Representations for Text Retrieval
Yi Luan | Jacob Eisenstein | Kristina Toutanova | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 9

Dual encoders perform retrieval by encoding documents and queries into dense low-dimensional vectors, scoring each document by its inner product with the query. We investigate the capacity of this architecture relative to sparse bag-of-words models and attentional neural networks. Using both theoretical and empirical analysis, we establish connections between the encoding dimension, the margin between gold and lower-ranked documents, and the document length, suggesting limitations in the capacity of fixed-length encodings to support precise retrieval of long documents. Building on these insights, we propose a simple neural model that combines the efficiency of dual encoders with some of the expressiveness of more costly attentional architectures, and explore sparse-dense hybrids to capitalize on the precision of sparse retrieval. These models outperform strong alternatives in large-scale retrieval.

pdf bib
Decontextualization: Making Sentences Stand-Alone
Eunsol Choi | Jennimaria Palomaki | Matthew Lamm | Tom Kwiatkowski | Dipanjan Das | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 9

Models for question answering, dialogue agents, and summarization often interpret the meaning of a sentence in a rich context and use that meaning in a new context. Taking excerpts of text can be problematic, as key pieces may not be explicit in a local window. We isolate and define the problem of sentence decontextualization: taking a sentence together with its context and rewriting it to be interpretable out of context, while preserving its meaning. We describe an annotation procedure, collect data on the Wikipedia corpus, and use the data to train models to automatically decontextualize sentences. We present preliminary studies that show the value of sentence decontextualization in a user-facing task, and as preprocessing for systems that perform document understanding. We argue that decontextualization is an important subtask in many downstream applications, and that the definitions and resources provided can benefit tasks that operate on sentences that occur in a richer context.

pdf bib
QED: A Framework and Dataset for Explanations in Question Answering
Matthew Lamm | Jennimaria Palomaki | Chris Alberti | Daniel Andor | Eunsol Choi | Livio Baldini Soares | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 9

A question answering system that in addition to providing an answer provides an explanation of the reasoning that leads to that answer has potential advantages in terms of debuggability, extensibility, and trust. To this end, we propose QED, a linguistically informed, extensible framework for explanations in question answering. A QED explanation specifies the relationship between a question and answer according to formal semantic notions such as referential equality, sentencehood, and entailment. We describe and publicly release an expert-annotated dataset of QED explanations built upon a subset of the Google Natural Questions dataset, and report baseline models on two tasks—post- hoc explanation generation given an answer, and joint question answering and explanation generation. In the joint setting, a promising result suggests that training on a relatively small amount of QED data can improve question answering. In addition to describing the formal, language-theoretic motivations for the QED approach, we describe a large user study showing that the presence of QED explanations significantly improves the ability of untrained raters to spot errors made by a strong neural QA baseline.

pdf bib
Partially Supervised Named Entity Recognition via the Expected Entity Ratio Loss
Thomas Effland | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 9

We study learning named entity recognizers in the presence of missing entity annotations. We approach this setting as tagging with latent variables and propose a novel loss, the Expected Entity Ratio, to learn models in the presence of systematically missing tags. We show that our approach is both theoretically sound and empirically useful. Experimentally, we find that it meets or exceeds performance of strong and state-of-the-art baselines across a variety of languages, annotation scenarios, and amounts of labeled data. In particular, we find that it significantly outperforms the previous state-of-the-art methods from Mayhew et al. (2019) and Li et al. (2021) by +12.7 and +2.3 F1 score in a challenging setting with only 1,000 biased annotations, averaged across 7 datasets. We also show that, when combined with our approach, a novel sparse annotation scheme outperforms exhaustive annotation for modest annotation budgets.1

pdf bib
Investigating the Effect of Background Knowledge on Natural Questions
Vidhisha Balachandran | Bhuwan Dhingra | Haitian Sun | Michael Collins | William Cohen
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

Existing work shows the benefits of integrating KBs with textual evidence for QA only on questions that are answerable by KBs alone (Sun et al., 2019). In contrast, real world QA systems often have to deal with questions that might not be directly answerable by KBs. Here, we investigate the effect of integrating background knowledge from KBs for the Natural Questions (NQ) task. We create a subset of the NQ data, Factual Questions (FQ), where the questions have evidence in the KB in the form of paths that link question entities to answer entities but still must be answered using text, to facilitate further research into KB integration methods. We propose and analyze a simple, model-agnostic approach for incorporating KB paths into text-based QA systems and establish a strong upper bound on FQ for our method using an oracle retriever. We show that several variants of Personalized PageRank based fact retrievers lead to a low recall of answer entities and consequently fail to improve QA performance. Our results suggest that fact retrieval is a bottleneck for integrating KBs into real world QA datasets

2020

pdf bib
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
Jonathan H. Clark | Eunsol Choi | Michael Collins | Dan Garrette | Tom Kwiatkowski | Vitaly Nikolaev | Jennimaria Palomaki
Transactions of the Association for Computational Linguistics, Volume 8

Confidently making progress on multilingual modeling requires challenging, trustworthy evaluations. We present TyDi QA—a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology—the set of linguistic features each language expresses—such that we expect models performing well on this set to generalize across a large number of the world’s languages. We present a quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora. To provide a realistic information-seeking task and avoid priming effects, questions are written by people who want to know the answer, but don’t know the answer yet, and the data is collected directly in each language without the use of translation.

pdf bib
Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios
Ramy Eskander | Smaranda Muresan | Michael Collins
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We describe a fully unsupervised cross-lingual transfer approach for part-of-speech (POS) tagging under a truly low resource scenario. We assume access to parallel translations between the target language and one or more source languages for which POS taggers are available. We use the Bible as parallel data in our experiments: small size, out-of-domain and covering many diverse languages. Our approach innovates in three ways: 1) a robust approach of selecting training instances via cross-lingual annotation projection that exploits best practices of unsupervised type and token constraints, word-alignment confidence and density of projected POS, 2) a Bi-LSTM architecture that uses contextualized word embeddings, affix embeddings and hierarchical Brown clusters, and 3) an evaluation on 12 diverse languages in terms of language family and morphological typology. In spite of the use of limited and out-of-domain parallel data, our experiments demonstrate significant improvements in accuracy over previous work. In addition, we show that using multi-source information, either via projection or output combination, improves the performance for most target languages.

2019

pdf bib
Synthetic QA Corpora Generation with Roundtrip Consistency
Chris Alberti | Daniel Andor | Emily Pitler | Jacob Devlin | Michael Collins
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We introduce a novel method of generating synthetic question answering corpora by combining models of question generation and answer extraction, and by filtering the results to ensure roundtrip consistency. By pretraining on the resulting corpora we obtain significant improvements on SQuAD2 and NQ, establishing a new state-of-the-art on the latter. Our synthetic data generation models, for both question generation and answer extraction, can be fully reproduced by finetuning a publicly available BERT model on the extractive subsets of SQuAD2 and NQ. We also describe a more powerful variant that does full sequence-to-sequence pretraining for question generation, obtaining exact match and F1 at less than 0.1% and 0.4% from human performance on SQuAD2.

pdf bib
Natural Questions: A Benchmark for Question Answering Research
Tom Kwiatkowski | Jennimaria Palomaki | Olivia Redfield | Michael Collins | Ankur Parikh | Chris Alberti | Danielle Epstein | Illia Polosukhin | Jacob Devlin | Kenton Lee | Kristina Toutanova | Llion Jones | Matthew Kelcey | Ming-Wei Chang | Andrew M. Dai | Jakob Uszkoreit | Quoc Le | Slav Petrov
Transactions of the Association for Computational Linguistics, Volume 7

We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature.

pdf bib
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark | Kenton Lee | Ming-Wei Chang | Tom Kwiatkowski | Michael Collins | Kristina Toutanova
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper we study yes/no questions that are naturally occurring — meaning that they are generated in unprompted and unconstrained settings. We build a reading comprehension dataset, BoolQ, of such questions, and show that they are unexpectedly challenging. They often query for complex, non-factoid information, and require difficult entailment-like inference to solve. We also explore the effectiveness of a range of transfer learning baselines. We find that transferring from entailment data is more effective than transferring from paraphrase or extractive QA data, and that it, surprisingly, continues to be very beneficial even when starting from massive pre-trained language models such as BERT. Our best method trains BERT on MultiNLI and then re-trains it on our train set. It achieves 80.4% accuracy compared to 90% accuracy of human annotators (and 62% majority-baseline), leaving a significant gap for future work.

pdf bib
Low-Resource Syntactic Transfer with Unsupervised Source Reordering
Mohammad Sadegh Rasooli | Michael Collins
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We describe a cross-lingual transfer method for dependency parsing that takes into account the problem of word order differences between source and target languages. Our model only relies on the Bible, a considerably smaller parallel data than the commonly used parallel data in transfer methods. We use the concatenation of projected trees from the Bible corpus, and the gold-standard treebanks in multiple source languages along with cross-lingual word representations. We demonstrate that reordering the source treebanks before training on them for a target language improves the accuracy of languages outside the European language family. Our experiments on 68 treebanks (38 languages) in the Universal Dependencies corpus achieve a high accuracy for all languages. Among them, our experiments on 16 treebanks of 12 non-European languages achieve an average UAS absolute improvement of 3.3% over a state-of-the-art method.

pdf bib
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti | Jeffrey Ling | Michael Collins | David Reitter
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

To advance models of multimodal context, we introduce a simple yet powerful neural architecture for data that combines vision and natural language. The “Bounding Boxes in Text Transformer” (B2T2) also leverages referential information binding words to portions of the image in a single unified architecture. B2T2 is highly effective on the Visual Commonsense Reasoning benchmark, achieving a new state-of-the-art with a 25% relative reduction in error rate compared to published baselines and obtaining the best performance to date on the public leaderboard (as of May 22, 2019). A detailed ablation analysis shows that the early integration of the visual features into the text analysis is key to the effectiveness of the new architecture. A reference implementation of our models is provided.

2018

pdf bib
Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency
Zhuang Ma | Michael Collins
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Noise Contrastive Estimation (NCE) is a powerful parameter estimation method for log-linear models, which avoids calculation of the partition function or its derivatives at each training step, a computationally demanding step in many cases. It is closely related to negative sampling methods, now widely used in NLP. This paper considers NCE-based estimation of conditional models. Conditional models are frequently encountered in practice; however there has not been a rigorous theoretical analysis of NCE in this setting, and we will argue there are subtle but important questions when generalizing NCE to the conditional case. In particular, we analyze two variants of NCE for conditional models: one based on a classification objective, the other based on a ranking objective. We show that the ranking-based variant of NCE gives consistent parameter estimates under weaker assumptions than the classification-based method; we analyze the statistical efficiency of the ranking-based and classification-based variants of NCE; finally we describe experiments on synthetic data and language modeling showing the effectiveness and tradeoffs of both methods.

2017

pdf bib
Source-Side Left-to-Right or Target-Side Left-to-Right? An Empirical Comparison of Two Phrase-Based Decoding Algorithms
Yin-Wen Chang | Michael Collins
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper describes an empirical study of the phrase-based decoding algorithm proposed by Chang and Collins (2017). The algorithm produces a translation by processing the source-language sentence in strictly left-to-right order, differing from commonly used approaches that build the target-language sentence in left-to-right order. Our results show that the new algorithm is competitive with Moses (Koehn et al., 2007) in terms of both speed and BLEU scores.

pdf bib
A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based Decoding with a Fixed Distortion Limit
Yin-Wen Chang | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 5

Decoding of phrase-based translation models in the general case is known to be NP-complete, by a reduction from the traveling salesman problem (Knight, 1999). In practice, phrase-based systems often impose a hard distortion limit that limits the movement of phrases during translation. However, the impact on complexity after imposing such a constraint is not well studied. In this paper, we describe a dynamic programming algorithm for phrase-based decoding with a fixed distortion limit. The runtime of the algorithm is O(nd!lhd+1) where n is the sentence length, d is the distortion limit, l is a bound on the number of phrases starting at any position in the sentence, and h is related to the maximum number of target language translations for any source word. The algorithm makes use of a novel representation that gives a new perspective on decoding of phrase-based models.

pdf bib
Cross-Lingual Syntactic Transfer with Limited Resources
Mohammad Sadegh Rasooli | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 5

We describe a simple but effective method for cross-lingual syntactic transfer of dependency parsers, in the scenario where a large amount of translation data is not available. This method makes use of three steps: 1) a method for deriving cross-lingual word clusters, which can then be used in a multilingual parser; 2) a method for transferring lexical information from a target language to source language treebanks; 3) a method for integrating these steps with the density-driven annotation projection method of Rasooli and Collins (2015). Experiments show improvements over the state-of-the-art in several languages used in previous work, in a setting where the only source of translation data is the Bible, a considerably smaller corpus than the Europarl corpus used in previous work. Results using the Europarl corpus as a source of translation data show additional improvements over the results of Rasooli and Collins (2015). We conclude with results on 38 datasets from the Universal Dependencies corpora.

2016

pdf bib
Towards a Convex HMM Surrogate for Word Alignment
Andrei Simion | Michael Collins | Cliff Stein
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Transforming Dependency Structures to Logical Forms for Semantic Parsing
Siva Reddy | Oscar Täckström | Michael Collins | Tom Kwiatkowski | Dipanjan Das | Mark Steedman | Mirella Lapata
Transactions of the Association for Computational Linguistics, Volume 4

The strongly typed syntax of grammar formalisms such as CCG, TAG, LFG and HPSG offers a synchronous framework for deriving syntactic structures and semantic logical forms. In contrast—partly due to the lack of a strong type system—dependency structures are easy to annotate and have become a widely used form of syntactic analysis for many languages. However, the lack of a type system makes a formal mechanism for deriving logical forms from dependency structures challenging. We address this by introducing a robust system based on the lambda calculus for deriving neo-Davidsonian logical forms from dependency trees. These logical forms are then used for semantic parsing of natural language to Freebase. Experiments on the Free917 and Web-Questions datasets show that our representation is superior to the original dependency trees and that it outperforms a CCG-based representation on this task. Compared to prior work, we obtain the strongest result to date on Free917 and competitive results on WebQuestions.

pdf bib
Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models
Karl Stratos | Michael Collins | Daniel Hsu
Transactions of the Association for Computational Linguistics, Volume 4

We tackle unsupervised part-of-speech (POS) tagging by learning hidden Markov models (HMMs) that are particularly well-suited for the problem. These HMMs, which we call anchor HMMs, assume that each tag is associated with at least one word that can have no other tag, which is a relatively benign condition for POS tagging (e.g., “the” is a word that appears only under the determiner tag). We exploit this assumption and extend the non-negative matrix factorization framework of Arora et al. (2013) to design a consistent estimator for anchor HMMs. In experiments, our algorithm is competitive with strong baselines such as the clustering method of Brown et al. (1992) and the log-linear model of Berg-Kirkpatrick et al. (2010). Furthermore, it produces an interpretable model in which hidden states are automatically lexicalized by words.

pdf bib
Globally Normalized Transition-Based Neural Networks
Daniel Andor | Chris Alberti | David Weiss | Aliaksei Severyn | Alessandro Presta | Kuzman Ganchev | Slav Petrov | Michael Collins
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
On A Strictly Convex IBM Model 1
Andrei Simion | Michael Collins | Cliff Stein
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Density-Driven Cross-Lingual Transfer of Dependency Parsers
Mohammad Sadegh Rasooli | Michael Collins
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Simple Semi-Supervised POS Tagging
Karl Stratos | Michael Collins
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

pdf bib
Structured Training for Neural Network Transition-Based Parsing
David Weiss | Chris Alberti | Michael Collins | Slav Petrov
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Model-based Word Embeddings from Decompositions of Count Matrices
Karl Stratos | Michael Collins | Daniel Hsu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

bib
Transactions of the Association for Computational Linguistics, Volume 3
Michael Collins | Lillian Lee
Transactions of the Association for Computational Linguistics, Volume 3

2014

bib
Transactions of the Association for Computational Linguistics, Volume 2
Dekang Lin | Michael Collins | Lillian Lee
Transactions of the Association for Computational Linguistics, Volume 2

pdf bib
A Provably Correct Learning Algorithm for Latent-Variable PCFGs
Shay B. Cohen | Michael Collins
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Constrained Viterbi Relaxation for Bidirectional Word Alignment
Yin-Wen Chang | Alexander M. Rush | John DeNero | Michael Collins
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning Dictionaries for Named Entity Recognition using Minimal Supervision
Arvind Neelakantan | Michael Collins
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Some Experiments with a Convex IBM Model 2
Andrei Simion | Michael Collins | Cliff Stein
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

2013

pdf bib
Optimal Beam Search for Machine Translation
Alexander Rush | Yin-Wen Chang | Michael Collins
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Convex Alternative to IBM Model 2
Andrei Simion | Michael Collins | Cliff Stein
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Experiments with Spectral Learning of Latent-Variable PCFGs
Shay B. Cohen | Karl Stratos | Michael Collins | Dean P. Foster | Lyle Ungar
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Approximate PCFG Parsing Using Tensor Decomposition
Shay B. Cohen | Giorgio Satta | Michael Collins
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Spectral Learning Algorithms for Natural Language Processing
Shay Cohen | Michael Collins | Dean Foster | Karl Stratos | Lyle Ungar
NAACL HLT 2013 Tutorial Abstracts

pdf bib
Spectral Learning of Refinement HMMs
Karl Stratos | Alexander Rush | Shay B. Cohen | Michael Collins
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

bib
Transactions of the Association for Computational Linguistics, Volume 1
Dekang Lin | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 1

2012

pdf bib
Spectral Learning of Latent-Variable PCFGs
Shay B. Cohen | Karl Stratos | Michael Collins | Dean P. Foster | Lyle Ungar
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Spectral Dependency Parsing with Latent Variables
Paramveer Dhillon | Jordan Rodu | Michael Collins | Dean Foster | Lyle Ungar
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
Alexander Rush | Roi Reichart | Michael Collins | Amir Globerson
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
Alexander M. Rush | Michael Collins
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Dual Decomposition for Natural Language Processing
Michael Collins | Alexander M. Rush
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib
Lagrangian Relaxation for Inference in Natural Language Processing
Michael Collins
Proceedings of the 12th International Conference on Parsing Technologies

pdf bib
Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Yin-Wen Chang | Michael Collins
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Efficient Third-Order Dependency Parsers
Terry Koo | Michael Collins
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing
Alexander M. Rush | David Sontag | Michael Collins | Tommi Jaakkola
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Dual Decomposition for Parsing with Non-Projective Head Automata
Terry Koo | Alexander M. Rush | Michael Collins | Tommi Jaakkola | David Sontag
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Mari Ostendorf | Michael Collins | Shri Narayanan | Douglas W. Oard | Lucy Vanderwende
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Mari Ostendorf | Michael Collins | Shri Narayanan | Douglas W. Oard | Lucy Vanderwende
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Non-Projective Parsing for Statistical Machine Translation
Xavier Carreras | Michael Collins
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
An Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing
Jun Suzuki | Hideki Isozaki | Xavier Carreras | Michael Collins
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning Context-Dependent Mappings from Sentences to Logical Form
Luke Zettlemoyer | Michael Collins
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-Rich Parsing
Xavier Carreras | Michael Collins | Terry Koo
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib
Simple Semi-supervised Dependency Parsing
Terry Koo | Xavier Carreras | Michael Collins
Proceedings of ACL-08: HLT

2007

pdf bib
Structured Prediction Models via the Matrix-Tree Theorem
Terry Koo | Amir Globerson | Xavier Carreras | Michael Collins
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Online Learning of Relaxed CCG Grammars for Parsing to Logical Form
Luke Zettlemoyer | Michael Collins
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Chinese Syntactic Reordering for Statistical Machine Translation
Chao Wang | Michael Collins | Philipp Koehn
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
A Discriminative Model for Tree-to-Tree Translation
Brooke Cowan | Ivona Kuc̆erová | Michael Collins
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib
Discriminative Reranking for Natural Language Parsing
Michael Collins | Terry Koo
Computational Linguistics, Volume 31, Number 1, March 2005

pdf bib
Hidden-Variable Models for Discriminative Reranking
Terry Koo | Michael Collins
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Morphology and Reranking for the Statistical Parsing of Spanish
Brooke Cowan | Michael Collins
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Discriminative Syntactic Language Modeling for Speech Recognition
Michael Collins | Brian Roark | Murat Saraclar
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
Clause Restructuring for Statistical Machine Translation
Michael Collins | Philipp Koehn | Ivona Kučerová
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm
Brian Roark | Murat Saraclar | Michael Collins | Mark Johnson
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Incremental Parsing with the Perceptron Algorithm
Michael Collins | Brian Roark
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Max-Margin Parsing
Ben Taskar | Dan Klein | Michael Collins | Daphne Koller | Christopher Manning
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

pdf bib
Head-Driven Statistical Models for Natural Language Parsing
Michael Collins
Computational Linguistics, Volume 29, Number 4, December 2003

2002

pdf bib
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
Michael Collins
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
Reranking an n-gram supertagger
John Chen | Srinivas Bangalore | Michael Collins | Owen Rambow
Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6)

pdf bib
New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron
Michael Collins | Nigel Duffy
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron
Michael Collins
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf bib
Parameter Estimation for Statistical Parsing Models: Theory and Practice of
Michael Collins
Proceedings of the Seventh International Workshop on Parsing Technologies

2000

pdf bib
Answer Extraction
Steven Abney | Michael Collins | Amit Singhal
Sixth Applied Natural Language Processing Conference

1999

pdf bib
A Statistical Parser for Czech
Michael Collins | Jan Hajic | Lance Ramshaw | Christoph Tillmann
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

pdf bib
Unsupervised Models for Named Entity Classification
Michael Collins | Yoram Singer
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

pdf bib
Book Reviews: Beyond Grammar: An Experience-based Theory of Language
Michael Collins
Computational Linguistics, Volume 25, Number 3, September 1999

1998

pdf bib
Semantic Tagging using a Probabilistic Context Free Grammar
Michael Collins | Scott Miller
Sixth Workshop on Very Large Corpora

1997

pdf bib
Three Generative, Lexicalised Models for Statistical Parsing
Michael Collins
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1996

pdf bib
A New Statistical Parser Based on Bigram Lexical Dependencies
Michael John Collins
34th Annual Meeting of the Association for Computational Linguistics

1995

pdf bib
University of Pennsylvania: Description of the University of Pennsylvania System Used for MUC-6
Breck Baldwin | Jeff Reynar | Mike Collins | Jason Eisner | Adwait Ratnaparkhi | Joseph Rosenzweig | Anoop Sarkar | Srinivas
Sixth Message Understanding Conference (MUC-6): Proceedings of a Conference Held in Columbia, Maryland, November 6-8, 1995

pdf bib
Prepositional Phrase Attachment through a Backed-off Model
Michael Collins | James Brooks
Third Workshop on Very Large Corpora

Search
Co-authors