Jimmy Lin


2021

pdf bib
Scientific Claim Verification with VerT5erini
Ronak Pradeep | Xueguang Ma | Rodrigo Nogueira | Jimmy Lin
Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis

This work describes the adaptation of a pretrained sequence-to-sequence model to the task of scientific claim verification in the biomedical domain. We propose a system called VerT5erini that exploits T5 for abstract retrieval, sentence selection, and label prediction, which are three critical sub-tasks of claim verification. We evaluate our pipeline on SciFACT, a newly curated dataset that requires models to not just predict the veracity of claims but also provide relevant sentences from a corpus of scientific literature that support the prediction. Empirically, our system outperforms a strong baseline in each of the three sub-tasks. We further show VerT5erini’s ability to generalize to two new datasets of COVID-19 claims using evidence from the CORD-19 corpus.

pdf bib
In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval
Sheng-Chieh Lin | Jheng-Hong Yang | Jimmy Lin
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

We present an efficient training approach to text retrieval with dense representations that applies knowledge distillation using the ColBERT late-interaction ranking model. Specifically, we propose to transfer the knowledge from a bi-encoder teacher to a student by distilling knowledge from ColBERT’s expressive MaxSim operator into a simple dot product. The advantage of the bi-encoder teacher–student setup is that we can efficiently add in-batch negatives during knowledge distillation, enabling richer interactions between teacher and student models. In addition, using ColBERT as the teacher reduces training cost compared to a full cross-encoder. Experiments on the MS MARCO passage and document ranking tasks and data from the TREC 2019 Deep Learning Track demonstrate that our approach helps models learn robust representations for dense retrieval effectively and efficiently.

pdf bib
BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression
Ji Xin | Raphael Tang | Yaoliang Yu | Jimmy Lin
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The slow speed of BERT has motivated much research on accelerating its inference, and the early exiting idea has been proposed to make trade-offs between model quality and efficiency. This paper aims to address two weaknesses of previous work: (1) existing fine-tuning strategies for early exiting models fail to take full advantage of BERT; (2) methods to make exiting decisions are limited to classification tasks. We propose a more advanced fine-tuning strategy and a learning-to-exit module that extends early exiting to tasks other than classification. Experiments demonstrate improved early exiting for BERT, with better trade-offs obtained by the proposed fine-tuning strategy, successful application to regression tasks, and the possibility to combine it with other acceleration methods. Source code can be found at https://github.com/castorini/berxit.

pdf bib
Don’t Change Me! User-Controllable Selective Paraphrase Generation
Mohan Zhang | Luchen Tan | Zihang Fu | Kun Xiong | Jimmy Lin | Ming Li | Zhengkai Tu
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

In the paraphrase generation task, source sentences often contain phrases that should not be altered. Which phrases, however, can be context dependent and can vary by application. Our solution to this challenge is to provide the user with explicit tags that can be placed around any arbitrary segment of text to mean “don’t change me!” when generating a paraphrase; the model learns to explicitly copy these phrases to the output. The contribution of this work is a novel data generation technique using distant supervision that allows us to start with a pretrained sequence-to-sequence model and fine-tune a paraphrase generator that exhibits this behavior, allowing user-controllable paraphrase generation. Additionally, we modify the loss during fine-tuning to explicitly encourage diversity in model output. Our technique is language agnostic, and we report experiments in English and Chinese.

pdf bib
Bag-of-Words Baselines for Semantic Code Search
Xinyu Zhang | Ji Xin | Andrew Yates | Jimmy Lin
Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021)

The task of semantic code search is to retrieve code snippets from a source code corpus based on an information need expressed in natural language. The semantic gap between natural language and programming languages has for long been regarded as one of the most significant obstacles to the effectiveness of keyword-based information retrieval (IR) methods. It is a common assumption that “traditional” bag-of-words IR methods are poorly suited for semantic code search: our work empirically investigates this assumption. Specifically, we examine the effectiveness of two traditional IR methods, namely BM25 and RM3, on the CodeSearchNet Corpus, which consists of natural language queries paired with relevant code snippets. We find that the two keyword-based methods outperform several pre-BERT neural models. We also compare several code-specific data pre-processing strategies and find that specialized tokenization improves effectiveness.

pdf bib
Pretrained Transformers for Text Ranking: BERT and Beyond
Andrew Yates | Rodrigo Nogueira | Jimmy Lin
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query for a particular task. Although the most common formulation of text ranking is search, instances of the task can also be found in many text processing applications. This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT (Bidirectional Encoder Representations from Transformers) is the best-known example. These models produce high quality results across many domains, tasks, and settings. This tutorial, which is based on the preprint of a forthcoming book to be published by Morgan and & Claypool under the Synthesis Lectures on Human Language Technologies series, provides an overview of existing work as a single point of entry for practitioners who wish to deploy transformers for text ranking in real-world applications and researchers who wish to pursue work in this area. We cover a wide range of techniques, grouped into two categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that perform ranking directly.

pdf bib
The Art of Abstention: Selective Prediction and Error Regularization for Natural Language Processing
Ji Xin | Raphael Tang | Yaoliang Yu | Jimmy Lin
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In selective prediction, a classifier is allowed to abstain from making predictions on low-confidence examples. Though this setting is interesting and important, selective prediction has rarely been examined in natural language processing (NLP) tasks. To fill this void in the literature, we study in this paper selective prediction for NLP, comparing different models and confidence estimators. We further propose a simple error regularization trick that improves confidence estimation without substantially increasing the computation budget. We show that recent pre-trained transformer models simultaneously improve both model accuracy and confidence estimation effectiveness. We also find that our proposed regularization improves confidence estimation and can be applied to other relevant scenarios, such as using classifier cascades for accuracy–efficiency trade-offs. Source code for this paper can be found at https://github.com/castorini/transformers-selective.

pdf bib
Exploring Listwise Evidence Reasoning with T5 for Fact Verification
Kelvin Jiang | Ronak Pradeep | Jimmy Lin
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This work explores a framework for fact verification that leverages pretrained sequence-to-sequence transformer models for sentence selection and label prediction, two key sub-tasks in fact verification. Most notably, improving on previous pointwise aggregation approaches for label prediction, we take advantage of T5 using a listwise approach coupled with data augmentation. With this enhancement, we observe that our label prediction stage is more robust to noise and capable of verifying complex claims by jointly reasoning over multiple pieces of evidence. Experimental results on the FEVER task show that our system attains a FEVER score of 75.87% on the blind test set. This puts our approach atop the competitive FEVER leaderboard at the time of our work, scoring higher than the second place submission by almost two points in label accuracy and over one point in FEVER score.

pdf bib
Semantics of the Unwritten: The Effect of End of Paragraph and Sequence Tokens on Text Generation with GPT2
He Bai | Peng Shi | Jimmy Lin | Luchen Tan | Kun Xiong | Wen Gao | Jie Liu | Ming Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

The semantics of a text is manifested not only by what is read but also by what is not read. In this article, we will study how those implicit “not read” information such as end-of-paragraph () and end-of-sequence () affect the quality of text generation. Specifically, we find that the pre-trained language model GPT2 can generate better continuations by learning to generate the in the fine-tuning stage. Experimental results on English story generation show that can lead to higher BLEU scores and lower perplexity. We also conduct experiments on a self-collected Chinese essay dataset with Chinese-GPT2, a character level LM without and during pre-training. Experimental results show that the Chinese GPT2 can generate better essay endings with .

2020

pdf bib
Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models
Jheng-Hong Yang | Sheng-Chieh Lin | Rodrigo Nogueira | Ming-Feng Tsai | Chuan-Ju Wang | Jimmy Lin
Proceedings of the 28th International Conference on Computational Linguistics

While internalized “implicit knowledge” in pretrained transformers has led to fruitful progress in many natural language understanding tasks, how to most effectively elicit such knowledge remains an open question. Based on the text-to-text transfer transformer (T5) model, this work explores a template-based approach to extract implicit knowledge for commonsense reasoning on multiple-choice (MC) question answering tasks. Experiments on three representative MC datasets show the surprisingly good performance of our simple template, coupled with a logit normalization technique for disambiguation. Furthermore, we verify that our proposed template can be easily extended to other MC tasks with contexts such as supporting facts in open-book question answering settings. Starting from the MC task, this work initiates further research to find generic natural language templates that can effectively leverage stored knowledge in pretrained models.

pdf bib
Document Ranking with a Pretrained Sequence-to-Sequence Model
Rodrigo Nogueira | Zhiying Jiang | Ronak Pradeep | Jimmy Lin
Findings of the Association for Computational Linguistics: EMNLP 2020

This work proposes the use of a pretrained sequence-to-sequence model for document ranking. Our approach is fundamentally different from a commonly adopted classification-based formulation based on encoder-only pretrained transformer architectures such as BERT. We show how a sequence-to-sequence model can be trained to generate relevance labels as “target tokens”, and how the underlying logits of these target tokens can be interpreted as relevance probabilities for ranking. Experimental results on the MS MARCO passage ranking task show that our ranking approach is superior to strong encoder-only models. On three other document retrieval test collections, we demonstrate a zero-shot transfer-based approach that outperforms previous state-of-the-art models requiring in-domain cross-validation. Furthermore, we find that our approach significantly outperforms an encoder-only architecture in a data-poor setting. We investigate this observation in more detail by varying target tokens to probe the model’s use of latent knowledge. Surprisingly, we find that the choice of target tokens impacts effectiveness, even for words that are closely related semantically. This finding sheds some light on why our sequence-to-sequence formulation for document ranking is effective. Code and models are available at pygaggle.ai.

pdf bib
Cross-Lingual Training of Neural Models for Document Ranking
Peng Shi | He Bai | Jimmy Lin
Findings of the Association for Computational Linguistics: EMNLP 2020

We tackle the challenge of cross-lingual training of neural document ranking models for mono-lingual retrieval, specifically leveraging relevance judgments in English to improve search in non-English languages. Our work successfully applies multi-lingual BERT (mBERT) to document ranking and additionally compares against a number of alternatives: translating the training data, translating documents, multi-stage hybrids, and ensembles. Experiments on test collections in six different languages from diverse language families reveal many interesting findings: model-based relevance transfer using mBERT can significantly improve search quality in (non-English) mono-lingual retrieval, but other “low resource” approaches are competitive as well.

pdf bib
Inserting Information Bottlenecks for Attribution in Transformers
Zhiying Jiang | Raphael Tang | Ji Xin | Jimmy Lin
Findings of the Association for Computational Linguistics: EMNLP 2020

Pretrained transformers achieve the state of the art across tasks in natural language processing, motivating researchers to investigate their inner mechanisms. One common direction is to understand what features are important for prediction. In this paper, we apply information bottlenecks to analyze the attribution of each feature for prediction on a black-box model. We use BERT as the example and evaluate our approach both quantitatively and qualitatively. We show the effectiveness of our method in terms of attribution and the ability to provide insight into how information flows through layers. We demonstrate that our technique outperforms two competitive methods in degradation tests on four datasets. Code is available at https://github.com/bazingagin/IBA.

pdf bib
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Ji Xin | Raphael Tang | Jaejun Lee | Yaoliang Yu | Jimmy Lin
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. However, they are also notorious for being slow in inference, which makes them difficult to deploy in real-time applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Our approach allows samples to exit earlier without passing through the entire model. Experiments show that DeeBERT is able to save up to ~40% inference time with minimal degradation in model quality. Further analyses show different behaviors in the BERT transformer layers and also reveal their redundancy. Our work provides new ideas to efficiently apply deep transformer-based models to downstream tasks. Code is available at https://github.com/castorini/DeeBERT.

pdf bib
Showing Your Work Doesn’t Always Work
Raphael Tang | Jaejun Lee | Ji Xin | Xinyu Liu | Yaoliang Yu | Jimmy Lin
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In natural language processing, a recently popular line of work explores how to best report the experimental results of neural networks. One exemplar publication, titled “Show Your Work: Improved Reporting of Experimental Results” (Dodge et al., 2019), advocates for reporting the expected validation effectiveness of the best-tuned model, with respect to the computational budget. In the present work, we critically examine this paper. As far as statistical generalizability is concerned, we find unspoken pitfalls and caveats with this approach. We analytically show that their estimator is biased and uses error-prone assumptions. We find that the estimator favors negative errors and yields poor bootstrapped confidence intervals. We derive an unbiased alternative and bolster our claims with empirical evidence from statistical simulation. Our codebase is at https://github.com/castorini/meanmax.

pdf bib
Two Birds, One Stone: A Simple, Unified Model for Text Generation from Structured and Unstructured Data
Hamidreza Shahidi | Ming Li | Jimmy Lin
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

A number of researchers have recently questioned the necessity of increasingly complex neural network (NN) architectures. In particular, several recent papers have shown that simpler, properly tuned models are at least competitive across several NLP tasks. In this work, we show that this is also the case for text generation from structured and unstructured data. We consider neural table-to-text generation and neural question generation (NQG) tasks for text generation from structured and unstructured data, respectively. Table-to-text generation aims to generate a description based on a given table, and NQG is the task of generating a question from a given passage where the generated question can be answered by a certain sub-span of the passage using NN models. Experimental results demonstrate that a basic attention-based seq2seq model trained with the exponential moving average technique achieves the state of the art in both tasks. Code is available at https://github.com/h-shahidi/2birds-gen.

pdf bib
Howl: A Deployed, Open-Source Wake Word Detection System
Raphael Tang | Jaejun Lee | Afsaneh Razi | Julia Cambre | Ian Bicking | Jofish Kaye | Jimmy Lin
Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)

We describe Howl, an open-source wake word detection toolkit with native support for open speech datasets such as Mozilla Common Voice (MCV) and Google Speech Commands (GSC). We report benchmark results of various models supported by our toolkit on GSC and our own freely available wake word detection dataset, built from MCV. One of our models is deployed in Firefox Voice, a plugin enabling speech interactivity for the Firefox web browser. Howl represents, to the best of our knowledge, the first fully productionized, open-source wake word detection toolkit with a web browser deployment target. Our codebase is at howl.ai.

pdf bib
Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT
Ashutosh Adhikari | Achyudh Ram | Raphael Tang | William L. Hamilton | Jimmy Lin
Proceedings of the 5th Workshop on Representation Learning for NLP

Fine-tuned variants of BERT are able to achieve state-of-the-art accuracy on many natural language processing tasks, although at significant computational costs. In this paper, we verify BERT’s effectiveness for document classification and investigate the extent to which BERT-level effectiveness can be obtained by different baselines, combined with knowledge distillation—a popular model compression method. The results show that BERT-level effectiveness can be achieved by a single-layer LSTM with at least 40× fewer FLOPS and only ∼3% parameters. More importantly, this study analyzes the limits of knowledge distillation as we distill BERT’s knowledge all the way down to linear models—a relevant baseline for the task. We report substantial improvement in effectiveness for even the simplest models, as they capture the knowledge learnt by BERT.

pdf bib
Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset
Edwin Zhang | Nikhil Gupta | Rodrigo Nogueira | Kyunghyun Cho | Jimmy Lin
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

The Neural Covidex is a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset (CORD-19) curated by the Allen Institute for AI. It exists as part of a suite of tools we have developed to help domain experts tackle the ongoing global pandemic. We hope that improved information access capabilities to the scientific literature can inform evidence-based decision making and insight generation.

pdf bib
Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset
Edwin Zhang | Nikhil Gupta | Raphael Tang | Xiao Han | Ronak Pradeep | Kuang Lu | Yue Zhang | Rodrigo Nogueira | Kyunghyun Cho | Hui Fang | Jimmy Lin
Proceedings of the First Workshop on Scholarly Document Processing

We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. Our system has been online and serving users since late March 2020. The Covidex is the user application component of our three-pronged strategy to develop technologies for helping domain experts tackle the ongoing global pandemic. In addition, we provide robust and easy-to-use keyword search infrastructure that exploits mature fusion-based methods as well as standalone neural ranking models that can be incorporated into other applications. These techniques have been evaluated in the multi-round TREC-COVID challenge: Our infrastructure and baselines have been adopted by many participants, including some of the best systems. In round 3, we submitted the highest-scoring run that took advantage of previous training data and the second-highest fully automatic run. In rounds 4 and 5, we submitted the highest-scoring fully automatic runs.

pdf bib
Cydex: Neural Search Infrastructure for the Scholarly Literature
Shane Ding | Edwin Zhang | Jimmy Lin
Proceedings of the First Workshop on Scholarly Document Processing

Cydex is a platform that provides neural search infrastructure for domain-specific scholarly literature. The platform represents an abstraction of Covidex, our recently developed full-stack open-source search engine for the COVID-19 Open Research Dataset (CORD-19) from AI2. While Covidex takes advantage of the latest best practices for keyword search using the popular Lucene search library as well as state-of-the-art neural ranking models using T5, parts of the system were hard coded to only work with CORD-19. This paper describes our efforts to generalize Covidex into Cydex, which can be applied to scholarly literature in different domains. By decoupling corpus-specific configurations from the frontend implementation, we are able to demonstrate the generality of Cydex on two very different corpora: the ACL Anthology and a collection of hydrology abstracts. Our platform is entirely open source and available at cydex.ai.

pdf bib
Early Exiting BERT for Efficient Document Ranking
Ji Xin | Rodrigo Nogueira | Yaoliang Yu | Jimmy Lin
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

Pre-trained language models such as BERT have shown their effectiveness in various tasks. Despite their power, they are known to be computationally intensive, which hinders real-world applications. In this paper, we introduce early exiting BERT for document ranking. With a slight modification, BERT becomes a model with multiple output paths, and each inference sample can exit early from these paths. In this way, computation can be effectively allocated among samples, and overall system latency is significantly reduced while the original quality is maintained. Our experiments on two document ranking datasets demonstrate up to 2.5x inference speedup with minimal quality degradation. The source code of our implementation can be found at https://github.com/castorini/earlyexiting-monobert.

pdf bib
A Little Bit Is Worse Than None: Ranking with Limited Training Data
Xinyu Zhang | Andrew Yates | Jimmy Lin
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

Researchers have proposed simple yet effective techniques for the retrieval problem based on using BERT as a relevance classifier to rerank initial candidates from keyword search. In this work, we tackle the challenge of fine-tuning these models for specific domains in a data and computationally efficient manner. Typically, researchers fine-tune models using corpus-specific labeled data from sources such as TREC. We first answer the question: How much data of this type do we need? Recognizing that the most computationally efficient training is no training, we explore zero-shot ranking using BERT models that have already been fine-tuned with the large MS MARCO passage retrieval dataset. We arrive at the surprising and novel finding that “some” labeled in-domain data can be worse than none at all.

2019

pdf bib
Incorporating Contextual and Syntactic Structures Improves Semantic Similarity Modeling
Linqing Liu | Wei Yang | Jinfeng Rao | Raphael Tang | Jimmy Lin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Semantic similarity modeling is central to many NLP problems such as natural language inference and question answering. Syntactic structures interact closely with semantics in learning compositional representations and alleviating long-range dependency issues. How-ever, such structure priors have not been well exploited in previous work for semantic mod-eling. To examine their effectiveness, we start with the Pairwise Word Interaction Model, one of the best models according to a recent reproducibility study, then introduce components for modeling context and structure using multi-layer BiLSTMs and TreeLSTMs. In addition, we introduce residual connections to the deep convolutional neural network component of the model. Extensive evaluations on eight benchmark datasets show that incorporating structural information contributes to consistent improvements over strong baselines.

pdf bib
Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval
Zeynep Akkalyoncu Yilmaz | Wei Yang | Haotian Zhang | Jimmy Lin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper applies BERT to ad hoc document retrieval on news articles, which requires addressing two challenges: relevance judgments in existing test collections are typically provided only at the document level, and documents often exceed the length that BERT was designed to handle. Our solution is to aggregate sentence-level evidence to rank documents. Furthermore, we are able to leverage passage-level relevance judgments fortuitously available in other domains to fine-tune BERT models that are able to capture cross-domain notions of relevance, and can be directly used for ranking news articles. Our simple neural ranking models achieve state-of-the-art effectiveness on three standard test collections.

pdf bib
Aligning Cross-Lingual Entities with Multi-Aspect Information
Hsiu-Wei Yang | Yanyan Zou | Peng Shi | Wei Lu | Jimmy Lin | Xu Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Multilingual knowledge graphs (KGs), such as YAGO and DBpedia, represent entities in different languages. The task of cross-lingual entity alignment is to match entities in a source language with their counterparts in target languages. In this work, we investigate embedding-based approaches to encode entities from multilingual KGs into the same vector space, where equivalent entities are close to each other. Specifically, we apply graph convolutional networks (GCNs) to combine multi-aspect information of entities, including topological connections, relations, and attributes of entities, to learn entity embeddings. To exploit the literal descriptions of entities expressed in different languages, we propose two uses of a pretrained multilingual BERT model to bridge cross-lingual gaps. We further propose two strategies to integrate GCN-based and BERT-based modules to boost performance. Extensive experiments on two benchmark datasets demonstrate that our method significantly outperforms existing systems.

pdf bib
Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling
Jinfeng Rao | Linqing Liu | Yi Tay | Wei Yang | Peng Shi | Jimmy Lin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

A core problem of information retrieval (IR) is relevance matching, which is to rank documents by relevance to a user’s query. On the other hand, many NLP problems, such as question answering and paraphrase identification, can be considered variants of semantic matching, which is to measure the semantic distance between two pieces of short texts. While at a high level both relevance and semantic matching require modeling textual similarity, many existing techniques for one cannot be easily adapted to the other. To bridge this gap, we propose a novel model, HCAN (Hybrid Co-Attention Network), that comprises (1) a hybrid encoder module that includes ConvNet-based and LSTM-based encoders, (2) a relevance matching module that measures soft term matches with importance weighting at multiple granularities, and (3) a semantic matching module with co-attention mechanisms that capture context-aware semantic relatedness. Evaluations on multiple IR and NLP benchmarks demonstrate state-of-the-art effectiveness compared to approaches that do not exploit pretraining on external data. Extensive ablation studies suggest that relevance and semantic matching signals are complementary across many problem settings, regardless of the choice of underlying encoders.

pdf bib
What Part of the Neural Network Does This? Understanding LSTMs by Measuring and Dissecting Neurons
Ji Xin | Jimmy Lin | Yaoliang Yu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Memory neurons of long short-term memory (LSTM) networks encode and process information in powerful yet mysterious ways. While there has been work to analyze their behavior in carrying low-level information such as linguistic properties, how they directly contribute to label prediction remains unclear. We find inspiration from biologists and study the affinity between individual neurons and labels, propose a novel metric to quantify the sensitivity of neurons to each label, and conduct experiments to show the validity of our proposed metric. We discover that some neurons are trained to specialize on a subset of labels, and while dropping an arbitrary neuron has little effect on the overall accuracy of the model, dropping label-specialized neurons predictably and significantly degrades prediction accuracy on the associated label. We further examine the consistency of neuron-label affinity across different models. These observations provide insight into the inner mechanisms of LSTMs.

pdf bib
Applying BERT to Document Retrieval with Birch
Zeynep Akkalyoncu Yilmaz | Shengjin Wang | Wei Yang | Haotian Zhang | Jimmy Lin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

We present Birch, a system that applies BERT to document retrieval via integration with the open-source Anserini information retrieval toolkit to demonstrate end-to-end search over large document collections. Birch implements simple ranking models that achieve state-of-the-art effectiveness on standard TREC newswire and social media test collections. This demonstration focuses on technical challenges in the integration of NLP and IR capabilities, along with the design rationale behind our approach to tightly-coupled integration between Python (to support neural networks) and the Java Virtual Machine (to support document retrieval using the open-source Lucene search library). We demonstrate integration of Birch with an existing search interface as well as interactive notebooks that highlight its capabilities in an easy-to-understand manner.

pdf bib
Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting
Jaejun Lee | Raphael Tang | Jimmy Lin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

Used for simple commands recognition on devices from smart speakers to mobile phones, keyword spotting systems are everywhere. Ubiquitous as well are web applications, which have grown in popularity and complexity over the last decade. However, despite their obvious advantages in natural language interaction, voice-enabled web applications are still few and far between. We attempt to bridge this gap with Honkling, a novel, JavaScript-based keyword spotting system. Purely client-side and cross-device compatible, Honkling can be deployed directly on user devices. Our in-browser implementation enables seamless personalization, which can greatly improve model quality; in the presence of underrepresented, non-American user accents, we can achieve up to an absolute 10% increase in accuracy in the personalized model with only a few examples.

pdf bib
Natural Language Generation for Effective Knowledge Distillation
Raphael Tang | Yao Lu | Jimmy Lin
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Knowledge distillation can effectively transfer knowledge from BERT, a deep language representation model, to traditional, shallow word embedding-based neural networks, helping them approach or exceed the quality of other heavyweight language representation models. As shown in previous work, critical to this distillation procedure is the construction of an unlabeled transfer dataset, which enables effective knowledge transfer. To create transfer set examples, we propose to sample from pretrained language models fine-tuned on task-specific text. Unlike previous techniques, this directly captures the purpose of the transfer set. We hypothesize that this principled, general approach outperforms rule-based techniques. On four datasets in sentiment classification, sentence similarity, and linguistic acceptability, we show that our approach improves upon previous methods. We outperform OpenAI GPT, a deep pretrained transformer, on three of the datasets, while using a single-layer bidirectional LSTM that runs at least ten times faster.

pdf bib
Scalable Knowledge Graph Construction from Text Collections
Ryan Clancy | Ihab F. Ilyas | Jimmy Lin
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

We present a scalable, open-source platform that “distills” a potentially large text collection into a knowledge graph. Our platform takes documents stored in Apache Solr and scales out the Stanford CoreNLP toolkit via Apache Spark integration to extract mentions and relations that are then ingested into the Neo4j graph database. The raw knowledge graph is then enriched with facts extracted from an external knowledge graph. The complete product can be manipulated by various applications using Neo4j’s native Cypher query language: We present a subgraph-matching approach to align extracted relations with external facts and show that fact verification, locating textual support for asserted facts, detecting inconsistent and missing facts, and extracting distantly-supervised training data can all be performed within the same framework.

pdf bib
Simple Attention-Based Representation Learning for Ranking Short Social Media Posts
Peng Shi | Jinfeng Rao | Jimmy Lin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

This paper explores the problem of ranking short social media posts with respect to user queries using neural networks. Instead of starting with a complex architecture, we proceed from the bottom up and examine the effectiveness of a simple, word-level Siamese architecture augmented with attention-based mechanisms for capturing semantic “soft” matches between query and post tokens. Extensive experiments on datasets from the TREC Microblog Tracks show that our simple models not only achieve better effectiveness than existing approaches that are far more complex or exploit a more diverse set of relevance signals, but are also much faster.

pdf bib
Rethinking Complex Neural Network Architectures for Document Classification
Ashutosh Adhikari | Achyudh Ram | Raphael Tang | Jimmy Lin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Neural network models for many NLP tasks have grown increasingly complex in recent years, making training and deployment more difficult. A number of recent papers have questioned the necessity of such architectures and found that well-executed, simpler models are quite effective. We show that this is also the case for document classification: in a large-scale reproducibility study of several recent neural models, we find that a simple BiLSTM architecture with appropriate regularization yields accuracy and F1 that are either competitive or exceed the state of the art on four standard benchmark datasets. Surprisingly, our simple model is able to achieve these results without attention mechanisms. While these regularization techniques, borrowed from language modeling, are not novel, to our knowledge we are the first to apply them in this context. Our work provides an open-source platform and the foundation for future work in document classification.

pdf bib
Detecting Customer Complaint Escalation with Recurrent Neural Networks and Manually-Engineered Features
Wei Yang | Luchen Tan | Chunwei Lu | Anqi Cui | Han Li | Xi Chen | Kun Xiong | Muzi Wang | Ming Li | Jian Pei | Jimmy Lin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

Consumers dissatisfied with the normal dispute resolution process provided by an e-commerce company’s customer service agents have the option of escalating their complaints by filing grievances with a government authority. This paper tackles the challenge of monitoring ongoing text chat dialogues to identify cases where the customer expresses such an intent, providing triage and prioritization for a separate pool of specialized agents specially trained to handle more complex situations. We describe a hybrid model that tackles this challenge by integrating recurrent neural networks with manually-engineered features. Experiments show that both components are complementary and contribute to overall recall, outperforming competitive baselines. A trial online deployment of our model demonstrates its business value in improving customer service.

pdf bib
End-to-End Open-Domain Question Answering with BERTserini
Wei Yang | Yuqing Xie | Aileen Lin | Xingyu Li | Luchen Tan | Kun Xiong | Ming Li | Jimmy Lin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

We demonstrate an end-to-end question answering system that integrates BERT with the open-source Anserini information retrieval toolkit. In contrast to most question answering and reading comprehension models today, which operate over small amounts of input text, our system integrates best practices from IR with a BERT-based reader to identify answers from a large corpus of Wikipedia articles in an end-to-end fashion. We report large improvements over previous results on a standard benchmark test collection, showing that fine-tuning pretrained BERT with SQuAD is sufficient to achieve high accuracy in identifying answer spans.

2018

pdf bib
Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia
Michael Azmy | Peng Shi | Jimmy Lin | Ihab Ilyas
Proceedings of the 27th International Conference on Computational Linguistics

Question answering over knowledge graphs is an important problem of interest both commercially and academically. There is substantial interest in the class of natural language questions that can be answered via the lookup of a single fact, driven by the availability of the popular SimpleQuestions dataset. The problem with this dataset, however, is that answer triples are provided from Freebase, which has been defunct for several years. As a result, it is difficult to build “real-world” question answering systems that are operationally deployable. Furthermore, a defunct knowledge graph means that much of the infrastructure for querying, browsing, and manipulating triples no longer exists. To address this problem, we present SimpleDBpediaQA, a new benchmark dataset for simple question answering over knowledge graphs that was created by mapping SimpleQuestions entities and predicates from Freebase to DBpedia. Although this mapping is conceptually straightforward, there are a number of nuances that make the task non-trivial, owing to the different conceptual organizations of the two knowledge graphs. To lay the foundation for future research using this dataset, we leverage recent work to provide simple yet strong baselines with and without neural networks.

pdf bib
Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks
Salman Mohammed | Peng Shi | Jimmy Lin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We examine the problem of question answering over knowledge graphs, focusing on simple questions that can be answered by the lookup of a single fact. Adopting a straightforward decomposition of the problem into entity detection, entity linking, relation prediction, and evidence combination, we explore simple yet strong baselines. On the popular SimpleQuestions dataset, we find that basic LSTMs and GRUs plus a few heuristics yield accuracies that approach the state of the art, and techniques that do not use neural networks also perform reasonably well. These results show that gains from sophisticated deep learning techniques proposed in the literature are quite modest and that some previous models exhibit unnecessary complexity.

pdf bib
Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures
Zhucheng Tu | Mengping Li | Jimmy Lin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

We demonstrate the serverless deployment of neural networks for model inferencing in NLP applications using Amazon’s Lambda service for feedforward evaluation and DynamoDB for storing word embeddings. Our architecture realizes a pay-per-request pricing model, requiring zero ongoing costs for maintaining server instances. All virtual machine management is handled behind the scenes by the cloud provider without any direct developer intervention. We describe a number of techniques that allow efficient use of serverless resources, and evaluations confirm that our design is both scalable and inexpensive.

pdf bib
CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities
Yiyun Liang | Zhucheng Tu | Laetitia Huang | Jimmy Lin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

We demonstrate a JavaScript implementation of a convolutional neural network that performs feedforward inference completely in the browser. Such a deployment means that models can run completely on the client, on a wide range of devices, without making backend server requests. This design is useful for applications with stringent latency requirements or low connectivity. Our evaluations show the feasibility of JavaScript as a deployment target. Furthermore, an in-browser implementation enables seamless integration with the JavaScript ecosystem for information visualization, providing opportunities to visually inspect neural networks and better understand their inner workings.

2017

pdf bib
An Insight Extraction System on BioMedical Literature with Deep Neural Networks
Hua He | Kris Ganjam | Navendu Jain | Jessica Lundin | Ryen White | Jimmy Lin
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Mining biomedical text offers an opportunity to automatically discover important facts and infer associations among them. As new scientific findings appear across a large collection of biomedical publications, our aim is to tap into this literature to automate biomedical knowledge extraction and identify important insights from them. Towards that goal, we develop a system with novel deep neural networks to extract insights on biomedical literature. Evaluation shows our system is able to provide insights with competitive accuracy of human acceptance and its relation extraction component outperforms previous work.

2016

pdf bib
Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement
Hua He | Jimmy Lin
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
UMD-TTIC-UW at SemEval-2016 Task 1: Attention-Based Multi-Perspective Convolutional Neural Networks for Textual Similarity Measurement
Hua He | John Wieting | Kevin Gimpel | Jinfeng Rao | Jimmy Lin
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks
Hua He | Kevin Gimpel | Jimmy Lin
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Gappy Pattern Matching on GPUs for On-Demand Extraction of Hierarchical Translation Grammars
Hua He | Jimmy Lin | Adam Lopez
Transactions of the Association for Computational Linguistics, Volume 3

Grammars for machine translation can be materialized on demand by finding source phrases in an indexed parallel corpus and extracting their translations. This approach is limited in practical applications by the computational expense of online lookup and extraction. For phrase-based models, recent work has shown that on-demand grammar extraction can be greatly accelerated by parallelization on general purpose graphics processing units (GPUs), but these algorithms do not work for hierarchical models, which require matching patterns that contain gaps. We address this limitation by presenting a novel GPU algorithm for on-demand hierarchical grammar extraction that is at least an order of magnitude faster than a comparable CPU algorithm when processing large batches of sentences. In terms of end-to-end translation, with decoding on the CPU, we increase throughput by roughly two thirds on a standard MT evaluation dataset. The GPU necessary to achieve these improvements increases the cost of a server by about a third. We believe that GPU-based extraction of hierarchical grammars is an attractive proposition, particularly for MT applications that demand high throughput.

2013

pdf bib
Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs
Hua He | Jimmy Lin | Adam Lopez
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
NAACL HLT 2013 Tutorial Abstracts
Jimmy Lin | Katrin Erk
NAACL HLT 2013 Tutorial Abstracts

pdf bib
Towards Efficient Large-Scale Feature-Rich Statistical Machine Translation
Vladimir Eidelman | Ke Wu | Ferhan Ture | Philip Resnik | Jimmy Lin
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce
Vladimir Eidelman | Ke Wu | Ferhan Ture | Philip Resnik | Jimmy Lin
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf bib
Combining Statistical Translation Techniques for Cross-Language Information Retrieval
Ferhan Ture | Jimmy Lin | Douglas Oard
Proceedings of COLING 2012

pdf bib
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling
Ferhan Ture | Jimmy Lin
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Putting the User in the Loop: Interactive Maximal Marginal Relevance for Query-Focused Summarization
Jimmy Lin | Nitin Madnani | Bonnie Dorr
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Data-Intensive Text Processing with MapReduce
Jimmy Lin | Chris Dyer
NAACL HLT 2010 Tutorial Abstracts

2009

pdf bib
Data Intensive Text Processing with MapReduce
Jimmy Lin | Chris Dyer
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts

2008

pdf bib
Exploring Large-Data Issues in the Curriculum: A Case Study with MapReduce
Jimmy Lin
Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics

pdf bib
Fast, Easy, and Cheap: Construction of Statistical Machine Translation Models with MapReduce
Chris Dyer | Aaron Cordova | Alex Mont | Jimmy Lin
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Co-occurrence Matrices with MapReduce
Jimmy Lin
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Pairwise Document Similarity in Large Collections with MapReduce
Tamer Elsayed | Jimmy Lin | Douglas Oard
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Proceedings of the ACL-08: HLT Demo Session
Jimmy Lin
Proceedings of the ACL-08: HLT Demo Session

2007

pdf bib
Is Question Answering Better than Information Retrieval? Towards a Task-Based Evaluation Framework for Question Series
Jimmy Lin
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Concept Disambiguation for Improved Subject Access Using Multiple Knowledge Sources
Tandeep Sidhu | Judith Klavans | Jimmy Lin
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

pdf bib
Answering Clinical Questions with Knowledge-Based and Statistical Techniques
Dina Demner-Fushman | Jimmy Lin
Computational Linguistics, Volume 33, Number 1, March 2007

pdf bib
Different Structures for Evaluating Answers to Complex Questions: Pyramids Won’t Topple, and Neither Will Human Assessors
Hoa Trang Dang | Jimmy Lin
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Leveraging Recurrent Phrase Structure in Large-scale Ontology Translation
G. Craig Murray | Bonnie J. Dorr | Jimmy Lin | Jan Hajič | Pavel Pecina
Proceedings of the 11th Annual conference of the European Association for Machine Translation

pdf bib
Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering
Dina Demner-Fushman | Jimmy Lin
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Leveraging Reusability: Cost-Effective Lexical Acquisition for Large-Scale Ontology Translation
G. Craig Murray | Bonnie J. Dorr | Jimmy Lin | Jan Hajič | Pavel Pecina
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
The Role of Information Retrieval in Answering Complex Questions
Jimmy Lin
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Situated Question Answering in the Clinical Domain: Selecting the Best Drug Treatment for Diseases
Dina Demner-Fushman | Jimmy Lin
Proceedings of the Workshop on Task-Focused Summarization and Question Answering

pdf bib
Generative Content Models for Structural Analysis of Medical Abstracts
Jimmy Lin | Damianos Karakos | Dina Demner-Fushman | Sanjeev Khudanpur
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

pdf bib
Will Pyramids Built of Nuggets Topple Over?
Jimmy Lin | Dina Demner-Fushman
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf bib
Automatically Evaluating Answers to Definition Questions
Jimmy Lin | Dina Demner-Fushman
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Evaluating Summaries and Answers: Two Sides of the Same Coin?
Jimmy Lin | Dina Demner-Fushman
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization

2004

pdf bib
Fine-Grained Lexical Semantic Representations and Compositionally-Derived Events in Mandarin Chinese
Jimmy Lin
Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004

pdf bib
Answering Definition Questions with Multiple Knowledge Sources
Wesley Hildebrandt | Boris Katz | Jimmy Lin
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

pdf bib
A Computational Framework for Non-Lexicalist Semantics
Jimmy Lin
Proceedings of the Student Research Workshop at HLT-NAACL 2004

2003

pdf bib
Extracting Structural Paraphrases from Aligned Monolingual Corpora
Ali Ibrahim | Boris Katz | Jimmy Lin
Proceedings of the Second International Workshop on Paraphrasing

2002

pdf bib
The Web as a Resource for Question Answering: Perspectives and Challenges
Jimmy Lin
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Annotating the Semantic Web Using Natural Language
Boris Katz | Jimmy Lin
COLING-02: The 2nd Workshop on NLP and XML (NLPXML-2002)

2001

pdf bib
Gathering Knowledge for a Question Answering System from Heterogeneous Information Sources
Boris Katz | Jimmy Lin | Sue Felshin
Proceedings of the ACL 2001 Workshop on Human Language Technology and Knowledge Management

2000

pdf bib
REXTOR: A System for Generating Relations from Natural Language
Boris Katz | Jimmy Lin
ACL-2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval

Search