Dipanjan Das


pdf bib
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation
Shashi Narayan | Gonçalo Simões | Yao Zhao | Joshua Maynez | Dipanjan Das | Michael Collins | Mirella Lapata
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose Composition Sampling, a simple but effective method to generate diverse outputs for conditional generation of higher quality compared to previous stochastic decoding strategies. It builds on recently proposed plan-based neural generation models (FROST, Narayan et al, 2021) that are trained to first create a composition of the output and then generate by conditioning on it and the input. Our approach avoids text degeneration by first sampling a composition in the form of an entity chain and then using beam search to generate the best possible text grounded to this entity chain. Experiments on summarization (CNN/DailyMail and XSum) and question generation (SQuAD), using existing and newly proposed automaticmetrics together with human-based evaluation, demonstrate that Composition Sampling is currently the best available decoding strategy for generating diverse meaningful outputs.


pdf bib
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features
Hannah Rashkin | David Reitter | Gaurav Singh Tomar | Dipanjan Das
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Knowledge-grounded dialogue systems are intended to convey information that is based on evidence provided in a given source text. We discuss the challenges of training a generative neural dialogue model for such systems that is controlled to stay faithful to the evidence. Existing datasets contain a mix of conversational responses that are faithful to selected evidence as well as more subjective or chit-chat style responses. We propose different evaluation measures to disentangle these different styles of responses by quantifying the informativeness and objectivity. At training time, additional inputs based on these evaluation measures are given to the dialogue model. At generation time, these additional inputs act as stylistic controls that encourage the model to generate responses that are faithful to the provided evidence. We also investigate the usage of additional controls at decoding time using resampling techniques. In addition to automatic metrics, we perform a human evaluation study where raters judge the output of these controlled generation models to be generally more objective and faithful to the evidence compared to baseline dialogue systems.

pdf bib
Decontextualization: Making Sentences Stand-Alone
Eunsol Choi | Jennimaria Palomaki | Matthew Lamm | Tom Kwiatkowski | Dipanjan Das | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 9

Abstract Models for question answering, dialogue agents, and summarization often interpret the meaning of a sentence in a rich context and use that meaning in a new context. Taking excerpts of text can be problematic, as key pieces may not be explicit in a local window. We isolate and define the problem of sentence decontextualization: taking a sentence together with its context and rewriting it to be interpretable out of context, while preserving its meaning. We describe an annotation procedure, collect data on the Wikipedia corpus, and use the data to train models to automatically decontextualize sentences. We present preliminary studies that show the value of sentence decontextualization in a user-facing task, and as preprocessing for systems that perform document understanding. We argue that decontextualization is an important subtask in many downstream applications, and that the definitions and resources provided can benefit tasks that operate on sentences that occur in a richer context.

pdf bib
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann | Tosin Adewumi | Karmanya Aggarwal | Pawan Sasanka Ammanamanchi | Anuoluwapo Aremu | Antoine Bosselut | Khyathi Raghavi Chandu | Miruna-Adriana Clinciu | Dipanjan Das | Kaustubh Dhole | Wanyu Du | Esin Durmus | Ondřej Dušek | Chris Chinenye Emezue | Varun Gangal | Cristina Garbacea | Tatsunori Hashimoto | Yufang Hou | Yacine Jernite | Harsh Jhamtani | Yangfeng Ji | Shailza Jolly | Mihir Kale | Dhruv Kumar | Faisal Ladhak | Aman Madaan | Mounica Maddela | Khyati Mahajan | Saad Mahamood | Bodhisattwa Prasad Majumder | Pedro Henrique Martins | Angelina McMillan-Major | Simon Mille | Emiel van Miltenburg | Moin Nadeem | Shashi Narayan | Vitaly Nikolaev | Andre Niyongabo Rubungo | Salomey Osei | Ankur Parikh | Laura Perez-Beltrachini | Niranjan Ramesh Rao | Vikas Raunak | Juan Diego Rodriguez | Sashank Santhanam | João Sedoc | Thibault Sellam | Samira Shaikh | Anastasia Shimorina | Marco Antonio Sobrevilla Cabezudo | Hendrik Strobelt | Nishant Subramani | Wei Xu | Diyi Yang | Akhila Yerukola | Jiawei Zhou
Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.


pdf bib
ToTTo: A Controlled Table-To-Text Generation Dataset
Ankur Parikh | Xuezhi Wang | Sebastian Gehrmann | Manaal Faruqui | Bhuwan Dhingra | Diyi Yang | Dipanjan Das
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia. We present systematic analyses of our dataset and annotation process as well as results achieved by several state-of-the-art baselines. While usually fluent, existing methods often hallucinate phrases that are not supported by the table, suggesting that this dataset can serve as a useful research benchmark for high-precision conditional text generation.

pdf bib
Syntactic Data Augmentation Increases Robustness to Inference Heuristics
Junghyun Min | R. Thomas McCoy | Dipanjan Das | Emily Pitler | Tal Linzen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Pretrained neural models such as BERT, when fine-tuned to perform natural language inference (NLI), often show high accuracy on standard datasets, but display a surprising lack of sensitivity to word order on controlled challenge sets. We hypothesize that this issue is not primarily caused by the pretrained model’s limitations, but rather by the paucity of crowdsourced NLI examples that might convey the importance of syntactic structure at the fine-tuning stage. We explore several methods to augment standard training sets with syntactically informative examples, generated by applying syntactic transformations to sentences from the MNLI corpus. The best-performing augmentation method, subject/object inversion, improved BERT’s accuracy on controlled examples that diagnose sensitivity to word order from 0.28 to 0.73, without affecting performance on the MNLI test set. This improvement generalized beyond the particular construction used for data augmentation, suggesting that augmentation causes BERT to recruit abstract syntactic representations.

pdf bib
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam | Dipanjan Das | Ankur Parikh
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Text generation has made significant advances in the last few years. Yet, evaluation metrics have lagged behind, as the most popular choices (e.g., BLEU and ROUGE) may correlate poorly with human judgment. We propose BLEURT, a learned evaluation metric for English based on BERT. BLEURT can model human judgment with a few thousand possibly biased training examples. A key aspect of our approach is a novel pre-training scheme that uses millions of synthetic examples to help the model generalize. BLEURT provides state-of-the-art results on the last three years of the WMT Metrics shared task and the WebNLG data set. In contrast to a vanilla BERT-based approach, it yields superior results even when the training data is scarce and out-of-distribution.

pdf bib
Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task
Thibault Sellam | Amy Pu | Hyung Won Chung | Sebastian Gehrmann | Qijun Tan | Markus Freitag | Dipanjan Das | Ankur Parikh
Proceedings of the Fifth Conference on Machine Translation

The quality of machine translation systems has dramatically improved over the last decade, and as a result, evaluation has become an increasingly challenging problem. This paper describes our contribution to the WMT 2020 Metrics Shared Task, the main benchmark for automatic evaluation of translation. We make several submissions based on BLEURT, a previously published which uses transfer learning. We extend the metric beyond English and evaluate it on 14 language pairs for which fine-tuning data is available, as well as 4 “zero-shot” language pairs, for which we have no labelled examples. Additionally, we focus on English to German and demonstrate how to combine BLEURT’s predictions with those of YiSi and use alternative reference translations to enhance the performance. Empirical results show that the models achieve competitive results on the WMT Metrics 2019 Shared Task, indicating their promise for the 2020 edition.


pdf bib
Text Generation with Exemplar-based Adaptive Decoding
Hao Peng | Ankur Parikh | Manaal Faruqui | Bhuwan Dhingra | Dipanjan Das
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We propose a novel conditioned text generation model. It draws inspiration from traditional template-based text generation techniques, where the source provides the content (i.e., what to say), and the template influences how to say it. Building on the successful encoder-decoder paradigm, it first encodes the content representation from the given input text; to produce the output, it retrieves exemplar text from the training data as “soft templates,” which are then used to construct an exemplar-specific decoder. We evaluate the proposed model on abstractive text summarization and data-to-text generation. Empirical results show that this model achieves strong performance and outperforms comparable baselines.

pdf bib
BERT Rediscovers the Classical NLP Pipeline
Ian Tenney | Dipanjan Das | Ellie Pavlick
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Pre-trained text encoders have rapidly advanced the state of the art on many NLP tasks. We focus on one such model, BERT, and aim to quantify where linguistic information is captured within the network. We find that the model represents the steps of the traditional NLP pipeline in an interpretable and localizable way, and that the regions responsible for each step appear in the expected sequence: POS tagging, parsing, NER, semantic roles, then coreference. Qualitative analysis reveals that the model can and often does adjust this pipeline dynamically, revising lower-level decisions on the basis of disambiguating information from higher-level representations.

pdf bib
Handling Divergent Reference Texts when Evaluating Table-to-Text Generation
Bhuwan Dhingra | Manaal Faruqui | Ankur Parikh | Ming-Wei Chang | Dipanjan Das | William Cohen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatically constructed datasets for generating text from semi-structured data (tables), such as WikiBio, often contain reference texts that diverge from the information in the corresponding semi-structured data. We show that metrics which rely solely on the reference texts, such as BLEU and ROUGE, show poor correlation with human judgments when those references diverge. We propose a new metric, PARENT, which aligns n-grams from the reference and generated texts to the semi-structured data before computing their precision and recall. Through a large scale human evaluation study of table-to-text models for WikiBio, we show that PARENT correlates with human judgments better than existing text generation metrics. We also adapt and evaluate the information extraction based evaluation proposed by Wiseman et al (2017), and show that PARENT has comparable correlation to it, while being easier to use. We show that PARENT is also applicable when the reference texts are elicited from humans using the data from the WebNLG challenge.


pdf bib
WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse
Manaal Faruqui | Ellie Pavlick | Ian Tenney | Dipanjan Das
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We release a corpus of 43 million atomic edits across 8 languages. These edits are mined from Wikipedia edit history and consist of instances in which a human editor has inserted a single contiguous phrase into, or deleted a single contiguous phrase from, an existing sentence. We use the collected data to show that the language generated during editing differs from the language that we observe in standard corpora, and that models trained on edits encode different aspects of semantics and discourse than models trained on raw text. We release the full corpus as a resource to aid ongoing research in semantics, discourse, and representation learning.

pdf bib
Learning To Split and Rephrase From Wikipedia Edit History
Jan A. Botha | Manaal Faruqui | John Alex | Jason Baldridge | Dipanjan Das
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia’s edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task. Incorporating WikiSplit as training data produces a model with qualitatively better predictions that score 32 BLEU points above the prior best result on the WebSplit benchmark.

pdf bib
Identifying Well-formed Natural Language Questions
Manaal Faruqui | Dipanjan Das
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Understanding search queries is a hard problem as it involves dealing with “word salad” text ubiquitously issued by users. However, if a query resembles a well-formed question, a natural language processing pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors. Hence, identifying whether or not a query is well formed can enhance query understanding. Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-wellformed categories and report an accuracy of 70.7% on the test set. We also show that our classifier can be used to improve the performance of neural sequence-to-sequence models for generating questions for reading comprehension.


pdf bib
Neural Paraphrase Identification of Questions with Noisy Pretraining
Gaurav Singh Tomar | Thyago Duque | Oscar Täckström | Jakob Uszkoreit | Dipanjan Das
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We present a solution to the problem of paraphrase identification of questions. We focus on a recent dataset of question pairs annotated with binary paraphrase labels and show that a variant of the decomposable attention model (replacing the word embeddings of the decomposable attention model of Parikh et al. 2016 with character n-gram representations) results in accurate performance on this task, while being far simpler than many competing neural architectures. Furthermore, when the model is pretrained on a noisy dataset of automatically collected question paraphrases, it obtains the best reported performance on the dataset.


pdf bib
Transforming Dependency Structures to Logical Forms for Semantic Parsing
Siva Reddy | Oscar Täckström | Michael Collins | Tom Kwiatkowski | Dipanjan Das | Mark Steedman | Mirella Lapata
Transactions of the Association for Computational Linguistics, Volume 4

The strongly typed syntax of grammar formalisms such as CCG, TAG, LFG and HPSG offers a synchronous framework for deriving syntactic structures and semantic logical forms. In contrast—partly due to the lack of a strong type system—dependency structures are easy to annotate and have become a widely used form of syntactic analysis for many languages. However, the lack of a type system makes a formal mechanism for deriving logical forms from dependency structures challenging. We address this by introducing a robust system based on the lambda calculus for deriving neo-Davidsonian logical forms from dependency trees. These logical forms are then used for semantic parsing of natural language to Freebase. Experiments on the Free917 and Web-Questions datasets show that our representation is superior to the original dependency trees and that it outperforms a CCG-based representation on this task. Compared to prior work, we obtain the strongest result to date on Free917 and competitive results on WebQuestions.

pdf bib
A Decomposable Attention Model for Natural Language Inference
Ankur Parikh | Oscar Täckström | Dipanjan Das | Jakob Uszkoreit
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP
Dipanjan Das | Chris Dyer | Manaal Faruqui | Yulia Tsvetkov
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP


pdf bib
Semantic Role Labeling with Neural Network Factors
Nicholas FitzGerald | Oscar Täckström | Kuzman Ganchev | Dipanjan Das
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Efficient Inference and Structured Learning for Semantic Role Labeling
Oscar Täckström | Kuzman Ganchev | Dipanjan Das
Transactions of the Association for Computational Linguistics, Volume 3

We present a dynamic programming algorithm for efficient constrained inference in semantic role labeling. The algorithm tractably captures a majority of the structural constraints examined by prior work in this area, which has resorted to either approximate methods or off-the-shelf integer linear programming solvers. In addition, it allows training a globally-normalized log-linear model with respect to constrained conditional likelihood. We show that the dynamic program is several times faster than an off-the-shelf integer linear programming solver, while reaching the same solution. Furthermore, we show that our structured model results in significant improvements over its local counterpart, achieving state-of-the-art results on both PropBank- and FrameNet-annotated corpora.


pdf bib
Semantic Frame Identification with Distributed Word Representations
Karl Moritz Hermann | Dipanjan Das | Jason Weston | Kuzman Ganchev
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Enhanced Search with Wildcards and Morphological Inflections in the Google Books Ngram Viewer
Jason Mann | David Zhang | Lu Yang | Dipanjan Das | Slav Petrov
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
Frame-Semantic Parsing
Dipanjan Das | Desai Chen | André F. T. Martins | Nathan Schneider | Noah A. Smith
Computational Linguistics, Volume 40, Issue 1 - March 2014

pdf bib
Statistical Models for Frame-Semantic Parsing
Dipanjan Das
Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014)

pdf bib
Learning Compact Lexicons for CCG Semantic Parsing
Yoav Artzi | Dipanjan Das | Slav Petrov
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)


pdf bib
Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
Kuzman Ganchev | Dipanjan Das
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
Oscar Täckström | Dipanjan Das | Slav Petrov | Ryan McDonald | Joakim Nivre
Transactions of the Association for Computational Linguistics, Volume 1

We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages.

pdf bib
Universal Dependency Annotation for Multilingual Parsing
Ryan McDonald | Joakim Nivre | Yvonne Quirmbach-Brundage | Yoav Goldberg | Dipanjan Das | Kuzman Ganchev | Keith Hall | Slav Petrov | Hao Zhang | Oscar Täckström | Claudia Bedini | Núria Bertomeu Castelló | Jungmee Lee
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


pdf bib
An Exact Dual Decomposition Algorithm for Shallow Semantic Parsing with Constraints
Dipanjan Das | André F. T. Martins | Noah A. Smith
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties
Dipanjan Das | Noah A. Smith
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Universal Part-of-Speech Tagset
Slav Petrov | Dipanjan Das | Ryan McDonald
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via three experiments, that (1) compare tagging accuracies across languages, (2) present an unsupervised grammar induction approach that does not use gold standard part-of-speech tags, and (3) use the universal tags to transfer dependency parsers between languages, achieving state-of-the-art results.


pdf bib
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
Dipanjan Das | Slav Petrov
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Semi-Supervised Frame-Semantic Parsing for Unknown Predicates
Dipanjan Das | Noah A. Smith
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
Kevin Gimpel | Nathan Schneider | Brendan O’Connor | Dipanjan Das | Daniel Mills | Jacob Eisenstein | Michael Heilman | Dani Yogatama | Jeffrey Flanigan | Noah A. Smith
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
Shay B. Cohen | Dipanjan Das | Noah A. Smith
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing


pdf bib
SEMAFOR: Frame Argument Resolution with Log-Linear Models
Desai Chen | Nathan Schneider | Dipanjan Das | Noah A. Smith
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Distributed Asynchronous Online Learning for Natural Language Processing
Kevin Gimpel | Dipanjan Das | Noah A. Smith
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

pdf bib
Movie Reviews and Revenues: An Experiment in Text Regression
Mahesh Joshi | Dipanjan Das | Kevin Gimpel | Noah A. Smith
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Probabilistic Frame-Semantic Parsing
Dipanjan Das | Nathan Schneider | Desai Chen | Noah A. Smith
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics


pdf bib
Non-textual Event Summarization by Applying Machine Learning to Template-based Language Generation
Mohit Kumar | Dipanjan Das | Sachin Agarwal | Alexander Rudnicky
Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009)

pdf bib
Paraphrase Identification as Probabilistic Quasi-Synchronous Recognition
Dipanjan Das | Noah A. Smith
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP


pdf bib
Automatic Extraction of Briefing Templates
Dipanjan Das | Mohit Kumar | Alexander I. Rudnicky
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Stacking Dependency Parsers
André Filipe Torres Martins | Dipanjan Das | Noah A. Smith | Eric P. Xing
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing