Donald Metzler


2022

pdf bib
ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
Kai Hui | Honglei Zhuang | Tao Chen | Zhen Qin | Jing Lu | Dara Bahri | Ji Ma | Jai Gupta | Cicero Nogueira dos Santos | Yi Tay | Donald Metzler
Findings of the Association for Computational Linguistics: ACL 2022

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms, however, are not without flaws, i.e., running the model on all query-document pairs at inference-time incurs a significant computational cost. This paper proposes a new training and inference paradigm for re-ranking. We propose to finetune a pretrained encoder-decoder model using in the form of document to query generation. Subsequently, we show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference. This results in significant inference time speedups since the decoder-only architecture only needs to learn to interpret static encoder embeddings during inference. Our experiments show that this new paradigm achieves results that are comparable to the more expensive cross-attention ranking approaches while being up to 6.8X faster. We believe this work paves the way for more efficient neural rankers that leverage large pretrained models.

2021

pdf bib
How Reliable are Model Diagnostics?
Vamsi Aribandi | Yi Tay | Donald Metzler
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Are Pretrained Convolutions Better than Pretrained Transformers?
Yi Tay | Mostafa Dehghani | Jai Prakash Gupta | Vamsi Aribandi | Dara Bahri | Zhen Qin | Donald Metzler
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In the era of pre-trained language models, Transformers are the de facto choice of model architectures. While recent research has shown promise in entirely convolutional, or CNN, architectures, they have not been explored using the pre-train-fine-tune paradigm. In the context of language models, are convolutional models competitive to Transformers when pre-trained? This paper investigates this research question and presents several interesting findings. Across an extensive set of experiments on 8 datasets/tasks, we find that CNN-based pre-trained models are competitive and outperform their Transformer counterpart in certain scenarios, albeit with caveats. Overall, the findings outlined in this paper suggest that conflating pre-training and architectural advances is misguided and that both advances should be considered independently. We believe our research paves the way for a healthy amount of optimism in alternative architectures.

pdf bib
StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling
Yikang Shen | Yi Tay | Che Zheng | Dara Bahri | Donald Metzler | Aaron Courville
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

There are two major classes of natural language grammars — the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words. While previous unsupervised parsing methods mostly focus on only inducing one class of grammars, we introduce a novel model, StructFormer, that can induce dependency and constituency structure at the same time. To achieve this, we propose a new parsing framework that can jointly generate a constituency tree and dependency graph. Then we integrate the induced dependency relations into the transformer, in a differentiable manner, through a novel dependency-constrained self-attention mechanism. Experimental results show that our model can achieve strong results on unsupervised constituency parsing, unsupervised dependency parsing, and masked language modeling at the same time.

2020

pdf bib
Reverse Engineering Configurations of Neural Text Generation Models
Yi Tay | Dara Bahri | Che Zheng | Clifford Brunk | Donald Metzler | Andrew Tomkins
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent advances in neural text generation modeling have resulted in a number of societal concerns related to how such approaches might be used in malicious ways. It is therefore desirable to develop a deeper understanding of the fundamental properties of such models. The study of artifacts that emerge in machine generated text as a result of modeling choices is a nascent research area. To this end, the extent and degree to which these artifacts surface in generated text is still unclear. In the spirit of better understanding generative text models and their artifacts, we propose the new task of distinguishing which of several variants of a given model generated some piece of text. Specifically, we conduct an extensive suite of diagnostic tests to observe whether modeling choices (e.g., sampling methods, top-k probabilities, model architectures, etc.) leave detectable artifacts in the text they generate. Our key finding, which is backed by a rigorous set of experiments, is that such artifacts are present and that different modeling choices can be inferred by looking at generated text alone. This suggests that neural text generators may actually be more sensitive to various modeling choices than previously thought.

2012

pdf bib
Structured Event Retrieval over Microblog Archives
Donald Metzler | Congxing Cai | Eduard Hovy
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques
Donald Metzler | Eduard Hovy | Chunliang Zhang
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Contextual Bearing on Linguistic Variation in Social Media
Stephan Gouws | Donald Metzler | Congxing Cai | Eduard Hovy
Proceedings of the Workshop on Language in Social Media (LSM 2011)

pdf bib
Unsupervised Mining of Lexical Variants from Noisy Text
Stephan Gouws | Dirk Hovy | Donald Metzler
Proceedings of the First workshop on Unsupervised Learning in NLP

2009

pdf bib
Search Engine Adaptation by Feedback Control Adjustment for Time-sensitive Query
Ruiqiang Zhang | Yi Chang | Zhaohui Zheng | Donald Metzler | Jian-yun Nie
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers