Dragomir Radev

Also published as: Dragomir R. Radev


2021

pdf bib
Improving Cross-lingual Text Classification with Zero-shot Instance-Weighting
Irene Li | Prithviraj Sen | Huaiyu Zhu | Yunyao Li | Dragomir Radev
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

Cross-lingual text classification (CLTC) is a challenging task made even harder still due to the lack of labeled data in low-resource languages. In this paper, we propose zero-shot instance-weighting, a general model-agnostic zero-shot learning framework for improving CLTC by leveraging source instance weighting. It adds a module on top of pre-trained language models for similarity computation of instance weights, thus aligning each source instance to the target language. During training, the framework utilizes gradient descent that is weighted by instance weights to update parameters. We evaluate this framework over seven target languages on three fundamental tasks and show its effectiveness and extensibility, by improving on F1 score up to 4% in single-source transfer and 8% in multi-source transfer. To the best of our knowledge, our method is the first to apply instance weighting in zero-shot CLTC. It is simple yet effective and easily extensible into multi-source transfer.

pdf bib
DART: Open-Domain Structured Data Record to Text Generation
Linyong Nan | Dragomir Radev | Rui Zhang | Amrit Rau | Abhinand Sivaprasad | Chiachun Hsieh | Xiangru Tang | Aadit Vyas | Neha Verma | Pranav Krishna | Yangxiaokang Liu | Nadia Irwanto | Jessica Pan | Faiaz Rahman | Ahmad Zaidi | Mutethia Mutuma | Yasin Tarabar | Ankit Gupta | Tao Yu | Yi Chern Tan | Xi Victoria Lin | Caiming Xiong | Richard Socher | Nazneen Fatema Rajani
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploiting the semantic dependencies among table headers and the table title. Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and spoken dialogue systems by utilizing techniques including tree ontology annotation, question-answer pair to declarative sentence conversion, and predicate unification, all with minimum post-editing. We present systematic evaluation on DART as well as new state-of-the-art results on WebNLG 2017 to show that DART (1) poses new challenges to existing data-to-text datasets and (2) facilitates out-of-domain generalization. Our data and code can be found at https://github.com/Yale-LILY/dart.

pdf bib
Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation
Alexander Fabbri | Simeng Han | Haoyuan Li | Haoran Li | Marjan Ghazvininejad | Shafiq Joty | Dragomir Radev | Yashar Mehdad
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks. However, these models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. In this work, we introduce a novel and generalizable method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner. WikiTransfer fine-tunes pretrained models on pseudo-summaries, produced from generic Wikipedia data, which contain characteristics of the target dataset, such as the length and level of abstraction of the desired summaries. WikiTransfer models achieve state-of-the-art, zero-shot abstractive summarization performance on the CNN-DailyMail dataset and demonstrate the effectiveness of our approach on three additional diverse datasets. These models are more robust to noisy data and also achieve better or comparable few-shot performance using 10 and 100 training examples when compared to few-shot transfer from other summarization datasets. To further boost performance, we employ data augmentation via round-trip translation as well as introduce a regularization term for improved few-shot transfer. To understand the role of dataset aspects in transfer performance and the quality of the resulting output summaries, we further study the effect of the components of our unsupervised fine-tuning data and analyze few-shot performance using both automatic and human evaluation.

pdf bib
QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization
Ming Zhong | Da Yin | Tao Yu | Ahmad Zaidi | Mutethia Mutuma | Rahul Jha | Ahmed Hassan Awadallah | Asli Celikyilmaz | Yang Liu | Xipeng Qiu | Dragomir Radev
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Meetings are a key component of human collaboration. As increasing numbers of meetings are recorded and transcribed, meeting summaries have become essential to remind those who may or may not have attended the meetings about the key decisions made and the tasks to be completed. However, it is hard to create a single short summary that covers all the content of a long meeting involving multiple people and topics. In order to satisfy the needs of different types of users, we define a new query-based multi-domain meeting summarization task, where models have to select and summarize relevant spans of meetings in response to a query, and we introduce QMSum, a new benchmark for this task. QMSum consists of 1,808 query-summary pairs over 232 meetings in multiple domains. Besides, we investigate a locate-then-summarize method and evaluate a set of strong summarization baselines on the task. Experimental results and manual analysis reveal that QMSum presents significant challenges in long meeting summarization for future research. Dataset is available at https://github.com/Yale-LILY/QMSum.

pdf bib
ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining
Alexander Fabbri | Faiaz Rahman | Imad Rizvi | Borui Wang | Haoran Li | Yashar Mehdad | Dragomir Radev
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While online conversations can cover a vast amount of information in many different formats, abstractive text summarization has primarily focused on modeling solely news articles. This research gap is due, in part, to the lack of standardized datasets for summarizing online discussions. To address this gap, we design annotation protocols motivated by an issues–viewpoints–assertions framework to crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads. We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data. To create a comprehensive benchmark, we also evaluate these models on widely-used conversation summarization datasets to establish strong baselines in this domain. Furthermore, we incorporate argument mining through graph construction to directly model the issues, viewpoints, and assertions present in a conversation and filter noisy input, showing comparable or improved results according to automatic and human evaluations.

pdf bib
Unsupervised Cross-Domain Prerequisite Chain Learning using Variational Graph Autoencoders
Irene Li | Vanessa Yan | Tianxiao Li | Rihao Qu | Dragomir Radev
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Learning prerequisite chains is an important task for one to pick up knowledge efficiently in both known and unknown domains. For example, one may be an expert in the natural language processing (NLP) domain, but want to determine the best order in which to learn new concepts in an unfamiliar Computer Vision domain (CV). Both domains share some common concepts, such as machine learning basics and deep learning models. In this paper, we solve the task of unsupervised cross-domain concept prerequisite chain learning, using an optimized variational graph autoencoder. Our model learns to transfer concept prerequisite relations from an information-rich domain (source domain) to an information-poor domain (target domain), substantially surpassing other baseline models. In addition, we expand an existing dataset by introducing two new domains—-CV and Bioinformatics (BIO). The annotated data and resources as well as the code will be made publicly available.

pdf bib
DocNLI: A Large-scale Dataset for Document-level Natural Language Inference
Wenpeng Yin | Dragomir Radev | Caiming Xiong
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
R-VGAE: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning
Irene Li | Alexander Fabbri | Swapnil Hingmire | Dragomir Radev
Proceedings of the 28th International Conference on Computational Linguistics

The task of concept prerequisite chain learning is to automatically determine the existence of prerequisite relationships among concept pairs. In this paper, we frame learning prerequisite relationships among concepts as an unsupervised task with no access to labeled concept pairs during training. We propose a model called the Relational-Variational Graph AutoEncoder (R-VGAE) to predict concept relations within a graph consisting of concept and resource nodes. Results show that our unsupervised approach outperforms graph-based semi-supervised methods and other baseline methods by up to 9.77% and 10.47% in terms of prerequisite relation prediction accuracy and F1 score. Our method is notably the first graph-based model that attempts to make use of deep learning representations for the task of unsupervised prerequisite learning. We also expand an existing corpus which totals 1,717 English Natural Language Processing (NLP)-related lecture slide files and manual concept pair annotations over 322 topics.

pdf bib
ESPRIT: Explaining Solutions to Physical Reasoning Tasks
Nazneen Fatema Rajani | Rui Zhang | Yi Chern Tan | Stephan Zheng | Jeremy Weiss | Aadit Vyas | Abhijit Gupta | Caiming Xiong | Richard Socher | Dragomir Radev
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Neural networks lack the ability to reason about qualitative physics and so cannot generalize to scenarios and tasks unseen during training. We propose ESPRIT, a framework for commonsense reasoning about qualitative physics in natural language that generates interpretable descriptions of physical events. We use a two-step approach of first identifying the pivotal physical events in an environment and then generating natural language descriptions of those events using a data-to-text approach. Our framework learns to generate explanations of how the physical simulation will causally evolve so that an agent or a human can easily reason about a solution using those interpretable descriptions. Human evaluations indicate that ESPRIT produces crucial fine-grained details and has high coverage of physical concepts compared to even human annotations. Dataset, code and documentation are available at https://github.com/salesforce/esprit.

pdf bib
Proceedings of the First Workshop on Interactive and Executable Semantic Parsing
Ben Bogin | Srinivasan Iyer | Victoria Lin | Dragomir Radev | Alane Suhr | Panupong | Caiming Xiong | Pengcheng Yin | Tao Yu | Rui Zhang | Victor Zhong
Proceedings of the First Workshop on Interactive and Executable Semantic Parsing

pdf bib
Universal Natural Language Processing with Limited Annotations: Try Few-shot Textual Entailment as a Start
Wenpeng Yin | Nazneen Fatema Rajani | Dragomir Radev | Richard Socher | Caiming Xiong
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

A standard way to address different NLP problems is by first constructing a problem-specific dataset, then building a model to fit this dataset. To build the ultimate artificial intelligence, we desire a single machine that can handle diverse new problems, for which task-specific annotations are limited. We bring up textual entailment as a unified solver for such NLP problems. However, current research of textual entailment has not spilled much ink on the following questions: (i) How well does a pretrained textual entailment system generalize across domains with only a handful of domain-specific examples? and (ii) When is it worth transforming an NLP task into textual entailment? We argue that the transforming is unnecessary if we can obtain rich annotations for this task. Textual entailment really matters particularly when the target NLP task has insufficient annotations. Universal NLP can be probably achieved through different routines. In this work, we introduce Universal Few-shot textual Entailment (UFO-Entail). We demonstrate that this framework enables a pretrained entailment model to work well on new entailment domains in a few-shot setting, and show its effectiveness as a unified solver for several downstream NLP tasks such as question answering and coreference resolution when the end-task annotations are limited.

2019

pdf bib
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
Tao Yu | Rui Zhang | Heyang Er | Suyi Li | Eric Xue | Bo Pang | Xi Victoria Lin | Yi Chern Tan | Tianze Shi | Zihan Li | Youxuan Jiang | Michihiro Yasunaga | Sungrok Shim | Tao Chen | Alexander Fabbri | Zifan Li | Luyao Chen | Yuwen Zhang | Shreya Dixit | Vincent Zhang | Caiming Xiong | Richard Socher | Walter Lasecki | Dragomir Radev
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets: (1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https://yale-lily.github.io/cosql.

pdf bib
Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions
Rui Zhang | Tao Yu | Heyang Er | Sungrok Shim | Eric Xue | Xi Victoria Lin | Tianze Shi | Caiming Xiong | Richard Socher | Dragomir Radev
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We focus on the cross-domain context-dependent text-to-SQL generation task. Based on the observation that adjacent natural language questions are often linguistically dependent and their corresponding SQL queries tend to overlap, we utilize the interaction history by editing the previous predicted query to improve the generation quality. Our editing mechanism views SQL as sequences and reuses generation results at the token level in a simple manner. It is flexible to change individual tokens and robust to error propagation. Furthermore, to deal with complex table structures in different domains, we employ an utterance-table encoder and a table-aware decoder to incorporate the context of the user utterance and the table schema. We evaluate our approach on the SParC dataset and demonstrate the benefit of editing compared with the state-of-the-art baselines which generate SQL from scratch. Our code is available at https://github.com/ryanzhumich/sparc_atis_pytorch.

pdf bib
Syntax-aware Neural Semantic Role Labeling with Supertags
Jungo Kasai | Dan Friedman | Robert Frank | Dragomir Radev | Owen Rambow
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce a new syntax-aware model for dependency-based semantic role labeling that outperforms syntax-agnostic models for English and Spanish. We use a BiLSTM to tag the text with supertags extracted from dependency parses, and we feed these supertags, along with words and parts of speech, into a deep highway BiLSTM for semantic role labeling. Our model combines the strengths of earlier models that performed SRL on the basis of a full dependency parse with more recent models that use no syntactic information at all. Our local and non-ensemble model achieves state-of-the-art performance on the CoNLL 09 English and Spanish datasets. SRL models benefit from syntactic information, and we show that supertagging is a simple, powerful, and robust way to incorporate syntax into a neural SRL system.

pdf bib
Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model
Alexander Fabbri | Irene Li | Tianwei She | Suyi Li | Dragomir Radev
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatic generation of summaries from multiple news articles is a valuable tool as the number of online publications grows rapidly. Single document summarization (SDS) systems have benefited from advances in neural encoder-decoder model thanks to the availability of large datasets. However, multi-document summarization (MDS) of news articles has been limited to datasets of a couple of hundred examples. In this paper, we introduce Multi-News, the first large-scale MDS news dataset. Additionally, we propose an end-to-end model which incorporates a traditional extractive summarization model with a standard SDS model and achieves competitive results on MDS datasets. We benchmark several methods on Multi-News and hope that this work will promote advances in summarization in the multi-document setting.

pdf bib
Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations
Rui Zhang | Caitlin Westerfield | Sungrok Shim | Garrett Bingham | Alexander Fabbri | William Hu | Neha Verma | Dragomir Radev
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we propose to boost low-resource cross-lingual document retrieval performance with deep bilingual query-document representations. We match queries and documents in both source and target languages with four components, each of which is implemented as a term interaction-based deep neural network with cross-lingual word embeddings as input. By including query likelihood scores as extra features, our model effectively learns to rerank the retrieved documents by using a small number of relevance labels for low-resource language pairs. Due to the shared cross-lingual word embedding space, the model can also be directly applied to another language pair without any training label. Experimental results on the Material dataset show that our model outperforms the competitive translation-based baselines on English-Swahili, English-Tagalog, and English-Somali cross-lingual information retrieval tasks.

pdf bib
SParC: Cross-Domain Semantic Parsing in Context
Tao Yu | Rui Zhang | Michihiro Yasunaga | Yi Chern Tan | Xi Victoria Lin | Suyi Li | Heyang Er | Irene Li | Bo Pang | Tao Chen | Emily Ji | Shreya Dixit | David Proctor | Sungrok Shim | Jonathan Kraft | Vincent Zhang | Caiming Xiong | Richard Socher | Dragomir Radev
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We present SParC, a dataset for cross-domainSemanticParsing inContext that consists of 4,298 coherent question sequences (12k+ individual questions annotated with SQL queries). It is obtained from controlled user interactions with 200 complex databases over 138 domains. We provide an in-depth analysis of SParC and show that it introduces new challenges compared to existing datasets. SParC demonstrates complex contextual dependencies, (2) has greater semantic diversity, and (3) requires generalization to unseen domains due to its cross-domain nature and the unseen databases at test time. We experiment with two state-of-the-art text-to-SQL models adapted to the context-dependent, cross-domain setup. The best model obtains an exact match accuracy of 20.2% over all questions and less than10% over all interaction sequences, indicating that the cross-domain setting and the con-textual phenomena of the dataset present significant challenges for future research. The dataset, baselines, and leaderboard are released at https://yale-lily.github.io/sparc.

2018

pdf bib
Robust Multilingual Part-of-Speech Tagging via Adversarial Training
Michihiro Yasunaga | Jungo Kasai | Dragomir Radev
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Adversarial training (AT) is a powerful regularization method for neural networks, aiming to achieve robustness to input perturbations. Yet, the specific effects of the robustness obtained from AT are still unclear in the context of natural language processing. In this paper, we propose and analyze a neural POS tagging model that exploits AT. In our experiments on the Penn Treebank WSJ corpus and the Universal Dependencies (UD) dataset (27 languages), we find that AT not only improves the overall tagging accuracy, but also 1) prevents over-fitting well in low resource languages and 2) boosts tagging accuracy for rare / unseen words. We also demonstrate that 3) the improved tagging performance by AT contributes to the downstream task of dependency parsing, and that 4) AT helps the model to learn cleaner word representations. 5) The proposed AT model is generally effective in different sequence labeling tasks. These positive results motivate further use of AT for natural language tasks.

pdf bib
TypeSQL: Knowledge-Based Type-Aware Neural Text-to-SQL Generation
Tao Yu | Zifan Li | Zilin Zhang | Rui Zhang | Dragomir Radev
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Interacting with relational databases through natural language helps users with any background easily query and analyze a vast amount of data. This requires a system that understands users’ questions and converts them to SQL queries automatically. In this paper, we present a novel approach TypeSQL which formats the problem as a slot filling task in a more reasonable way. In addition, TypeSQL utilizes type information to better understand rare entities and numbers in the questions. We experiment this idea on the WikiSQL dataset and outperform the prior art by 6% in much shorter time. We also show that accessing the content of databases can significantly improve the performance when users’ queries are not well-formed. TypeSQL can reach 82.6% accuracy, a 17.5% absolute improvement compared to the previous content-sensitive model.

pdf bib
Improving Text-to-SQL Evaluation Methodology
Catherine Finegan-Dollak | Jonathan K. Kummerfeld | Li Zhang | Karthik Ramanathan | Sesh Sadasivam | Rui Zhang | Dragomir Radev
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

To be informative, an evaluation must measure how well systems generalize to realistic unseen data. We identify limitations of and propose improvements to current evaluations of text-to-SQL systems. First, we compare human-generated and automatically generated questions, characterizing properties of queries necessary for real-world applications. To facilitate evaluation on multiple datasets, we release standardized and improved versions of seven existing datasets and one new text-to-SQL dataset. Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work. Finally, we demonstrate how the common practice of anonymizing variables during evaluation removes an important challenge of the task. Our observations highlight key difficulties, and our methodology enables effective measurement of future development.

pdf bib
TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation
Alexander Fabbri | Irene Li | Prawat Trairatvorakul | Yijiao He | Weitai Ting | Robert Tung | Caitlin Westerfield | Dragomir Radev
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The field of Natural Language Processing (NLP) is growing rapidly, with new research published daily along with an abundance of tutorials, codebases and other online resources. In order to learn this dynamic field or stay up-to-date on the latest research, students as well as educators and researchers must constantly sift through multiple sources to find valuable, relevant information. To address this situation, we introduce TutorialBank, a new, publicly available dataset which aims to facilitate NLP education and research. We have manually collected and categorized over 5,600 resources on NLP as well as the related fields of Artificial Intelligence (AI), Machine Learning (ML) and Information Retrieval (IR). Our dataset is notably the largest manually-picked corpus of resources intended for NLP education which does not include only academic papers. Additionally, we have created both a search engine and a command-line tool for the resources and have annotated the corpus to include lists of research topics, relevant resources for each topic, prerequisite relations among topics, relevant sub-parts of individual resources, among other annotations. We are releasing the dataset and present several avenues for further research.

pdf bib
Neural Coreference Resolution with Deep Biaffine Attention by Joint Mention Detection and Mention Clustering
Rui Zhang | Cícero Nogueira dos Santos | Michihiro Yasunaga | Bing Xiang | Dragomir Radev
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Coreference resolution aims to identify in a text all mentions that refer to the same real world entity. The state-of-the-art end-to-end neural coreference model considers all text spans in a document as potential mentions and learns to link an antecedent for each possible mention. In this paper, we propose to improve the end-to-end coreference resolution system by (1) using a biaffine attention model to get antecedent scores for each possible mention, and (2) jointly optimizing the mention detection accuracy and mention clustering accuracy given the mention cluster labels. Our model achieves the state-of-the-art performance on the CoNLL-2012 shared task English test set.

pdf bib
SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task
Tao Yu | Michihiro Yasunaga | Kai Yang | Rui Zhang | Dongxu Wang | Zifan Li | Dragomir Radev
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Most existing studies in text-to-SQL tasks do not require generating complex SQL queries with multiple clauses or sub-queries, and generalizing to new, unseen databases. In this paper we propose SyntaxSQLNet, a syntax tree network to address the complex and cross-domain text-to-SQL generation task. SyntaxSQLNet employs a SQL specific syntax tree-based decoder with SQL generation path history and table-aware column attention encoders. We evaluate SyntaxSQLNet on a new large-scale text-to-SQL corpus containing databases with multiple tables and complex SQL queries containing multiple SQL clauses and nested queries. We use a database split setting where databases in the test set are unseen during training. Experimental results show that SyntaxSQLNet can handle a significantly greater number of complex SQL examples than prior work, outperforming the previous state-of-the-art model by 9.5% in exact matching accuracy. To our knowledge, we are the first to study this complex text-to-SQL task. Our task and models with the latest updates are available at https://yale-lily.github.io/seq2sql/spider.

pdf bib
Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
Tao Yu | Rui Zhang | Kai Yang | Michihiro Yasunaga | Dongxu Wang | Zifan Li | James Ma | Irene Li | Qingning Yao | Shanelle Roman | Zilin Zhang | Dragomir Radev
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present Spider, a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables covering 138 different domains. We define a new complex and cross-domain semantic parsing and text-to-SQL task so that different complicated SQL queries and databases appear in train and test sets. In this way, the task requires the model to generalize well to both new SQL queries and new database schemas. Therefore, Spider is distinct from most of the previous semantic parsing tasks because they all use a single database and have the exact same program in the train set and the test set. We experiment with various state-of-the-art models and the best model achieves only 9.7% exact matching accuracy on a database split setting. This shows that Spider presents a strong challenge for future research. Our dataset and task with the most recent updates are publicly available at https://yale-lily.github.io/seq2sql/spider.

2017

pdf bib
Graph-based Neural Multi-Document Summarization
Michihiro Yasunaga | Rui Zhang | Kshitijh Meelu | Ayush Pareek | Krishnan Srinivasan | Dragomir Radev
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

We propose a neural multi-document summarization system that incorporates sentence relation graphs. We employ a Graph Convolutional Network (GCN) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features. Through multiple layer-wise propagation, the GCN generates high-level hidden sentence features for salience estimation. We then use a greedy heuristic to extract salient sentences that avoid redundancy. In our experiments on DUC 2004, we consider three types of sentence relation graphs and demonstrate the advantage of combining sentence relations in graphs with the representation power of deep neural networks. Our model improves upon other traditional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive results against other state-of-the-art multi-document summarization systems.

2016

pdf bib
Nested Propositions in Open Information Extraction
Nikita Bhutani | H. V. Jagadish | Dragomir Radev
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Effects of Creativity and Cluster Tightness on Short Text Clustering Performance
Catherine Finegan-Dollak | Reed Coke | Rui Zhang | Xiangyi Ye | Dragomir Radev
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest
Dragomir Radev | Amanda Stent | Joel Tetreault | Aasish Pappu | Aikaterini Iliakopoulou | Agustin Chanfreau | Paloma de Juan | Jordi Vallmitjana | Alejandro Jaimes | Rahul Jha | Robert Mankoff
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The New Yorker publishes a weekly captionless cartoon. More than 5,000 readers submit captions for it. The editors select three of them and ask the readers to pick the funniest one. We describe an experiment that compares a dozen automatic methods for selecting the funniest caption. We show that negative sentiment, human-centeredness, and lexical centrality most strongly match the funniest captions, followed by positive sentiment. These results are useful for understanding humor and also in the design of more engaging conversational agents in text and multimodal (vision+text) systems. As part of this work, a large set of cartoons and captions is being made available to the community.

pdf bib
Sentence Similarity based on Dependency Tree Kernels for Multi-document Summarization
Şaziye Betül Özateş | Arzucan Özgür | Dragomir Radev
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We introduce an approach based on using the dependency grammar representations of sentences to compute sentence similarity for extractive multi-document summarization. We adapt and investigate the effects of two untyped dependency tree kernels, which have originally been proposed for relation extraction, to the multi-document summarization problem. In addition, we propose a series of novel dependency grammar based kernels to better represent the syntactic and semantic similarities among the sentences. The proposed methods incorporate the type information of the dependency relations for sentence similarity calculation. To our knowledge, this is the first study that investigates using dependency tree based sentence similarity for multi-document summarization.

pdf bib
Extractive Summarization under Strict Length Constraints
Yashar Mehdad | Amanda Stent | Kapil Thadani | Dragomir Radev | Youssef Billawala | Karolina Buchner
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we report a comparison of various techniques for single-document extractive summarization under strict length budgets, which is a common commercial use case (e.g. summarization of news articles by news aggregators). We show that, evaluated using ROUGE, numerous algorithms from the literature fail to beat a simple lead-based baseline for this task. However, a supervised approach with lightweight and efficient features improves over the lead-based baseline. Additional human evaluation demonstrates that the supervised approach also performs competitively with a commercial system that uses more sophisticated features.

pdf bib
A Low-Rank Approximation Approach to Learning Joint Embeddings of News Stories and Images for Timeline Summarization
William Yang Wang | Yashar Mehdad | Dragomir R. Radev | Amanda Stent
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents
Rui Zhang | Honglak Lee | Dragomir R. Radev
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Content Models for Survey Generation: A Factoid-Based Evaluation
Rahul Jha | Catherine Finegan-Dollak | Ben King | Reed Coke | Dragomir Radev
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Heterogeneous Networks and Their Applications: Scientometrics, Name Disambiguation, and Topic Modeling
Ben King | Rahul Jha | Dragomir R. Radev
Transactions of the Association for Computational Linguistics, Volume 2

We present heterogeneous networks as a way to unify lexical networks with relational data. We build a unified ACL Anthology network, tying together the citation, author collaboration, and term-cooccurence networks with affiliation and venue relations. This representation proves to be convenient and allows problems such as name disambiguation, topic modeling, and the measurement of scientific impact to be easily solved using only this network and off-the-shelf graph algorithms.

pdf bib
A Random Walk–Based Model for Identifying Semantic Orientation
Ahmed Hassan | Amjad Abu-Jbara | Wanchen Lu | Dragomir Radev
Computational Linguistics, Volume 40, Issue 3 - September 2014

pdf bib
Experiments in Sentence Language Identification with Groups of Similar Languages
Ben King | Dragomir Radev | Steven Abney
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects

2013

pdf bib
Purpose and Polarity of Citation: Towards NLP-based Bibliometrics
Amjad Abu-Jbara | Jefferson Ezra | Dragomir Radev
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Experimental Results on the Native Language Identification Shared Task
Amjad Abu-Jbara | Rahul Jha | Eric Morley | Dragomir Radev
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Proceedings of the Fourth Workshop on Teaching NLP and CL
Ivan Derzhanski | Dragomir Radev
Proceedings of the Fourth Workshop on Teaching NLP and CL

pdf bib
Introducing Computational Concepts in a Linguistics Olympiad
Patrick Littell | Lori Levin | Jason Eisner | Dragomir Radev
Proceedings of the Fourth Workshop on Teaching NLP and CL

pdf bib
Random Walk Factoid Annotation for Collective Discourse
Ben King | Rahul Jha | Dragomir Radev | Robert Mankoff
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
A System for Summarizing Scientific Topics Starting from Keywords
Rahul Jha | Amjad Abu-Jbara | Dragomir Radev
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Identifying Opinion Subgroups in Arabic Online Discussions
Amjad Abu-Jbara | Ben King | Mona Diab | Dragomir Radev
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
UMichigan: A Conditional Random Field Model for Resolving the Scope of Negation
Amjad Abu-Jbara | Dragomir Radev
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants
Ahmed Hassan | Amjad Abu-Jbara | Dragomir Radev
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Rediscovering ACL Discoveries Through the Lens of ACL Anthology Network Citing Sentences
Dragomir Radev | Amjad Abu-Jbara
Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

pdf bib
Extracting Signed Social Networks from Text
Ahmed Hassan | Amjad Abu-Jbara | Dragomir Radev
Workshop Proceedings of TextGraphs-7: Graph-based Methods for Natural Language Processing

pdf bib
Reference Scope Identification in Citing Sentences
Amjad Abu-Jbara | Dragomir Radev
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
AttitudeMiner: Mining Attitude from Online Discussions
Amjad Abu-Jbara | Ahmed Hassan | Dragomir Radev
Proceedings of the Demonstration Session at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Subgroup Detection in Ideological Discussions
Amjad Abu-Jbara | Pradeep Dasigi | Mona Diab | Dragomir Radev
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Subgroup Detector: A System for Detecting Subgroups in Online Discussions
Amjad Abu-Jbara | Dragomir Radev
Proceedings of the ACL 2012 System Demonstrations

2011

pdf bib
Simultaneous Similarity Learning and Feature-Weight Learning for Document Clustering
Pradeep Muthukrishnan | Dragomir Radev | Qiaozhu Mei
Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing

pdf bib
Rumor has it: Identifying Misinformation in Microblogs
Vahed Qazvinian | Emily Rosengren | Dragomir R. Radev | Qiaozhu Mei
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Coherent Citation-Based Summarization of Scientific Papers
Amjad Abu-Jbara | Dragomir Radev
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice
Vahed Qazvinian | Dragomir R. Radev
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Identifying the Semantic Orientation of Foreign Words
Ahmed Hassan | Amjad Abu-Jbara | Rahul Jha | Dragomir Radev
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis
Amjad Abu-Jbara | Dragomir Radev
Proceedings of the ACL-HLT 2011 System Demonstrations

2010

pdf bib
Identifying Text Polarity Using Random Walks
Ahmed Hassan | Dragomir R. Radev
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.
Vahed Qazvinian | Dragomir R. Radev
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
What’s with the Attitude? Identifying Sentences with Attitude in Online Discussions
Ahmed Hassan | Vahed Qazvinian | Dragomir Radev
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Citation Summarization Through Keyphrase Extraction
Vahed Qazvinian | Dragomir R. Radev | Arzucan Özgür
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib
Detecting Speculations and their Scopes in Scientific Text
Arzucan Özgür | Dragomir R. Radev
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Using Citations to Generate surveys of Scientific Paradigms
Saif Mohammad | Bonnie Dorr | Melissa Egan | Ahmed Hassan | Pradeep Muthukrishan | Vahed Qazvinian | Dragomir Radev | David Zajic
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Supervised Classification for Extracting Biomedical Events
Arzucan Özgür | Dragomir Radev
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

pdf bib
The ACL Anthology Network
Dragomir R. Radev | Pradeep Muthukrishnan | Vahed Qazvinian
Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL)

2008

pdf bib
The North American Computational Linguistics Olympiad (NACLO)
Dragomir R. Radev | Lori Levin | Thomas E. Payne
Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics

pdf bib
Tracking the Dynamic Evolution of Participants Salience in a Discussion
Ahmed Hassan | Anthony Fader | Michael H. Crespin | Kevin M. Quinn | Burt L. Monroe | Michael Colaresi | Dragomir R. Radev
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Detecting Multiple Facets of an Event using Graph-Based Unsupervised Methods
Pradeep Muthukrishnan | Joshua Gerrish | Dragomir R. Radev
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Scientific Paper Summarization Using Citation Summary Networks
Vahed Qazvinian | Dragomir R. Radev
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics
Steven Bird | Robert Dale | Bonnie Dorr | Bryan Gibson | Mark Joseph | Min-Yen Kan | Dongwon Lee | Brett Powley | Dragomir Radev | Yee Fan Tan
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.

pdf bib
Modeling Document Dynamics: an Evolutionary Approach
Jahna Otterbacher | Dragomir Radev
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

News articles about the same event published over time have properties that challenge NLP and IR applications. A cluster of such texts typically exhibits instances of paraphrase and contradiction, as sources update the facts surrounding the story, often due to an ongoing investigation. The current hypothesis is that the stories “evolve” over time, beginning with the first text published on a given topic. This is tested using a phylogenetic approach as well as one based on language modeling. The fit of the evolutionary models is evaluated with respect to how well they facilitate the recovery of chronological relationships between the documents. Over all data clusters, the language modeling approach consistently outperforms the phylogenetics model. However, on manually collected clusters in which the documents are published within short time spans of one another, both have a similar performance, and produce statistically significant results on the document chronology recovery evaluation.

2007

pdf bib
Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing
Chris Biemann | Irina Matveeva | Rada Mihalcea | Dragomir Radev
Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing

pdf bib
Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing
Güneş Erkan | Arzucan Özgür | Dragomir R. Radev
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
MavenRank: Identifying Influential Members of the US Senate Using Lexical Centrality
Anthony Fader | Dragomir R. Radev | Michael H. Crespin | Burt L. Monroe | Kevin M. Quinn | Michael Colaresi
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Adding Syntax to Dynamic Programming for Aligning Comparable Texts for the Generation of Paraphrases
Siwei Shen | Dragomir R. Radev | Agam Patel | Güneş Erkan
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
LexNet: A Graphical Environment for Graph-Based NLP
Dragomir R. Radev | Güneş Erkan | Anthony Fader | Patrick Jordan | Siwei Shen | James P. Sweeney
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

pdf bib
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing
Rada Mihalcea | Dragomir Radev
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing

pdf bib
Lexical similarity can distinguish between automatic and manual translations
Agam Patel | Dragomir R. Radev
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We consider the problem of identifying automatic translations from manual translations of the same sentence. Using two different similarity metrics (BLEU and Levenshtein edit distance), we found out that automatic translations are closer to each other than they are to manual translations. We also use phylogenetic trees to provide a visual representation of the distances between pairs of individual sentences in a set of translations. The differences in lexical distance are statistically significant, both for Chinese to English and for Arabic to English translations.

pdf bib
Graph-based Algorithms for Natural Language Processing and Information Retrieval
Rada Mihalcea | Dragomir Radev
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

2005

pdf bib
Using Random Walks for Question-focused Sentence Retrieval
Jahna Otterbacher | Güneş Erkan | Dragomir Radev
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL
Chris Brew | Dragomir Radev
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL

2004

pdf bib
LexPageRank: Prestige in Multi-Document Text Summarization
Güneş Erkan | Dragomir R. Radev
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Smorgasbord of Features for Statistical Machine Translation
Franz Josef Och | Daniel Gildea | Sanjeev Khudanpur | Anoop Sarkar | Kenji Yamada | Alex Fraser | Shankar Kumar | Libin Shen | David Smith | Katherine Eng | Viren Jain | Zhen Jin | Dragomir Radev
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

pdf bib
A Scaleable Multi-document Centroid-based Summarizer
Dragomir Radev | Timothy Allison | Matthew Craig | Stanko Dimitrov | Kareem Omer | Michael Topper | Adam Winkel | Jin Yi
Demonstration Papers at HLT-NAACL 2004

pdf bib
Computational Linkuistics: Word Triggers across Hyperlinks
Dragomir Radev | Hong Qi | Adam Winkel | Daniel Tam
Proceedings of HLT-NAACL 2004: Short Papers

pdf bib
RevisionBank: A Resource for Revision-based Multi-document Summarization and Evaluation
Jahna Otterbacher | Dragomir Radev
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Multi-document summaries produced via sentence extraction often suffer from a number of cohesion problems, including dangling anaphora, sudden shifts in topic and incorrect or awkward chronological ordering. Therefore, the development of an automated revision process to correct such problems is a research area of current interest. We present the RevisionBank, a corpus of 240 extractive, multi-document summaries that have been manually revised to promote cohesion. The summaries were revised by six linguistic students using a constrained set of revision operations that we previously developed. In the current paper, we describe the process of developing a taxonomy of cohesion problems and corrective revision operators that address such problems, as well as an annotation schema for our corpus. Finally, we discuss how our taxonomy and corpus can be used for the study of revision-based multi-document summarization as well as for summary evaluation.

pdf bib
CST Bank: A Corpus for the Study of Cross-document Structural Relationships
Dragomir Radev | Jahna Otterbacher | Zhu Zhang
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Clusters of multiple news stories related to the same topic exhibit a number of interesting properties. For example, when documents have been published at various points in time or by different authors or news agencies, one finds many instances of paraphrasing, information overlap and even contradiction. The current paper presents the Cross-document Structure Theory (CST) Bank, a collection of multi-document clusters in which pairs of sentences from different documents have been annotated for cross-document structure theory relationships. We will describe how we built the corpus, including our method for reducing the number of sentence pairs to be annotated by our hired judges, using lexical similarity measures. Finally, we will describe how CST and the CST Bank can be applied to different research areas such as multi-document summarization.

pdf bib
MEAD - A Platform for Multidocument Multilingual Text Summarization
Dragomir Radev | Timothy Allison | Sasha Blair-Goldensohn | John Blitzer | Arda Çelebi | Stanko Dimitrov | Elliott Drabek | Ali Hakim | Wai Lam | Danyu Liu | Jahna Otterbacher | Hong Qi | Horacio Saggion | Simone Teufel | Michael Topper | Adam Winkel | Zhu Zhang
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Comparing Semantically Related Sentences: The Case of Paraphrase Versus Subsumption
Jahna Otterbacher | Dragomir Radev
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Evaluation Challenges in Large-Scale Document Summarization
Dragomir R. Radev | Simone Teufel | Horacio Saggion | Wai Lam | John Blitzer | Hong Qi | Arda Çelebi | Danyu Liu | Elliott Drabek
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Sub-event based multi-document summarization
Naomi Daniel | Dragomir Radev | Timothy Allison
Proceedings of the HLT-NAACL 03 Text Summarization Workshop

pdf bib
Multi-document summarization using off the shelf compression software
Amardeep Grewal | Timothy Allison | Stanko Dimitrov | Dragomir Radev
Proceedings of the HLT-NAACL 03 Text Summarization Workshop

2002

pdf bib
Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Cross-lingual Environment
Horacio Saggion | Dragomir Radev | Simone Teufel | Wai Lam | Stephanie M. Strassel
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Evaluating Web-based Question Answering Systems
Dragomir R. Radev | Hong Qi | Harris Wu | Weiguo Fan
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Revisions that improve cohesion in multi-document summaries: a preliminary study
Jahna C. Otterbacher | Dragomir R. Radev | Airong Luo
Proceedings of the ACL-02 Workshop on Automatic Summarization

pdf bib
Introduction to the Special Issue on Summarization
Dragomir R. Radev | Eduard Hovy | Kathleen McKeown
Computational Linguistics, Volume 28, Number 4, December 2002

pdf bib
Meta-evaluation of Summaries in a Cross-lingual Environment using Content-based Metrics
Horacio Saggion | Dragomir Radev | Simone Teufel | Wai Lam
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Answering What-Is Questions by Virtual Annotation
John Prager | Dragomir Radev | Krzysztof Czuba
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
NewsInEssence: A System For Domain-Independent, Real-Time News Clustering and Multi-Document Summarization
Dragomir R. Radev | Sasha Blair-Goldensohn | Zhu Zhang | Revathi Sundara Raghavan
Proceedings of the First International Conference on Human Language Technology Research

2000

pdf bib
Ranking suspected answers to natural language questions using predictive annotation
Dragomir R. Radev | John Prager | Valerie Samn
Sixth Applied Natural Language Processing Conference

pdf bib
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies
Dragomir R. Radev | Hongyan Jing | Malgorzata Budzikowska
NAACL-ANLP 2000 Workshop: Automatic Summarization

pdf bib
A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure
Dragomir Radev
1st SIGdial Workshop on Discourse and Dialogue

pdf bib
Automatic summarization of search engine hit lists
Dragomir R. Radev | Weiguo Fan
ACL-2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval

1998

pdf bib
Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Descriptions of Entities
Dragomir R. Radev
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Descriptions of Entities
Dragomir R. Radev
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
Generating Natural Language Summaries from Multiple On-Line Sources
Dragomir R. Radev | Kathleen R. McKeown
Computational-Linguistics, Volume 24, Number 3, September 1998

1997

pdf bib
Building a Generation Knowledge Source using Internet-Accessible Newswire
Dragomir R. Radev | Kathleen R. McKeown
Fifth Conference on Applied Natural Language Processing

pdf bib
Software Re-Use and Evolution in Text Generation Applications
Karen Kukich | Rebecca Passonneau | Kathleen McKeown | Dragomir Radev | Vasileios Hatzivassiloglou | Hongyan Jing
From Research to Commercial Applications: Making NLP Work in Practice

1996

pdf bib
Using Word Class for Part-of-speech Disambiguation
Evelyne Tzoukermann | Dragomir R. Radev
Fourth Workshop on Very Large Corpora

pdf bib
An Architecture For Distributed Natural Language Summarization
Dragomir R. Radev
Eighth International Natural Language Generation Workshop (Posters and Demonstrations)

Search
Co-authors