Dipanjan Das - ACL Anthology

Dipanjan Das

2025

Dolomites: Domain-Specific Long-Form Methodical Tasks
Chaitanya Malaviya | Priyanka Agrawal | Kuzman Ganchev | Pranesh Srinivasan | Fantine Huot | Jonathan Berant | Mark Yatskar | Dipanjan Das | Mirella Lapata | Chris Alberti
Transactions of the Association for Computational Linguistics, Volume 13

Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form of a task objective, procedure, input, and output, and introduce DoLoMiTes, a novel benchmark with specifications for 519 such tasks elicited from hundreds of experts from across 25 fields. Our benchmark further contains specific instantiations of methodical tasks with concrete input and output examples (1,857 in total) which we obtain by collecting expert revisions of up to 10 model-generated examples of each task. We use these examples to evaluate contemporary language models, highlighting that automating methodical tasks is a challenging long-form generation problem, as it requires performing complex inferences, while drawing upon the given context as well as domain knowledge. Our dataset is available at https://dolomites-benchmark.github.io/.

2023

Query Refinement Prompts for Closed-Book Long-Form QA
Reinald Kim Amplayo | Kellie Webster | Michael Collins | Dipanjan Das | Shashi Narayan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models (LLMs) have been shown to perform well in answering questions and in producing long-form texts, both in few-shot closed-book settings. While the former can be validated using well-known evaluation metrics, the latter is difficult to evaluate. We resolve the difficulties to evaluate long-form output by doing both tasks at once – to do question answering that requires long-form answers. Such questions tend to be multifaceted, i.e., they may have ambiguities and/or require information from multiple sources. To this end, we define query refinement prompts that encourage LLMs to explicitly express the multifacetedness in questions and generate long-form answers covering multiple facets of the question. Our experiments on two long-form question answering datasets, ASQA and AQuAMuSe, show that using our prompts allows us to outperform fully finetuned models in the closed book setting, as well as achieve results comparable to retrieve-then-generate open-book models.

Measuring Attribution in Natural Language Generation Models
Hannah Rashkin | Vitaly Nikolaev | Matthew Lamm | Lora Aroyo | Michael Collins | Dipanjan Das | Slav Petrov | Gaurav Singh Tomar | Iulia Turc | David Reitter
Computational Linguistics, Volume 49, Issue 4 - December 2023

Large neural models have brought a new challenge to natural language generation (NLG): It has become imperative to ensure the safety and reliability of the output of models that generate freely. To this end, we present an evaluation framework, Attributable to Identified Sources (AIS), stipulating that NLG output pertaining to the external world is to be verified against an independent, provided source. We define AIS and a two-stage annotation pipeline for allowing annotators to evaluate model output according to annotation guidelines. We successfully validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset). We provide full annotation guidelines in the appendices and publicly release the annotated data at https://github.com/google-research-datasets/AIS.

Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation
Fantine Huot | Joshua Maynez | Shashi Narayan | Reinald Kim Amplayo | Kuzman Ganchev | Annie Priyadarshini Louis | Anders Sandholm | Dipanjan Das | Mirella Lapata
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

While conditional generation models can now generate natural language well enough to create fluent text, it is still difficult to control the generation process, leading to irrelevant, repetitive, and hallucinated content. Recent work shows that planning can be a useful intermediate step to render conditional generation less opaque and more grounded. We present a web browser-based demonstration for query-focused summarization that uses a sequence of question-answer pairs, as a blueprint plan for guiding text generation (i.e., what to say and in what order). We illustrate how users may interact with the generated text and associated plan visualizations, e.g., by editing and modifying the plan in order to improve or control the generated output.A short video demonstrating our system is available at https://goo.gle/text-blueprint-demo

SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Elizabeth Clark | Shruti Rijhwani | Sebastian Gehrmann | Joshua Maynez | Roee Aharoni | Vitaly Nikolaev | Thibault Sellam | Aditya Siddhant | Dipanjan Das | Ankur Parikh
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Reliable automatic evaluation of summarization systems is challenging due to the multifaceted and subjective nature of the task. This is especially the case for languages other than English, where human evaluations are scarce. In this work, we introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation. SEAHORSE consists of 96K summaries with human ratings along 6 dimensions of text quality: comprehensibility, repetition, grammar, attribution, main ideas, and conciseness, covering 6 languages, 9 systems, and 4 datasets. As a result of its size and scope, SEAHORSE can serve both as a benchmark to evaluate learnt metrics, as well as a large-scale resource for training such metrics. We show that metrics trained with SEAHORSE achieve strong performance on the out-of-domain meta-evaluation benchmarks TRUE (Honovich et al., 2022) and mFACE (Aharoni et al., 2022). We make the SEAHORSE dataset and metrics publicly available for future research on multilingual and multifaceted summarization evaluation.

Conditional Generation with a Question-Answering Blueprint
Shashi Narayan | Joshua Maynez | Reinald Kim Amplayo | Kuzman Ganchev | Annie Louis | Fantine Huot | Anders Sandholm | Dipanjan Das | Mirella Lapata
Transactions of the Association for Computational Linguistics, Volume 11

The ability to convey relevant and faithful information is critical for many tasks in conditional generation and yet remains elusive for neural seq-to-seq models whose outputs often reveal hallucinations and fail to correctly cover important details. In this work, we advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded. We propose a new conceptualization of text plans as a sequence of question-answer (QA) pairs and enhance existing datasets (e.g., for summarization) with a QA blueprint operating as a proxy for content selection (i.e., what to say) and planning (i.e., in what order). We obtain blueprints automatically by exploiting state-of-the-art question generation technology and convert input-output pairs into input-blueprint-output tuples. We develop Transformer-based models, each varying in how they incorporate the blueprint in the generated output (e.g., as a global plan or iteratively). Evaluation across metrics and datasets demonstrates that blueprint models are more factual than alternatives which do not resort to planning and allow tighter control of the generation output.

QAmeleon: Multilingual QA with Only 5 Examples
Priyanka Agrawal | Chris Alberti | Fantine Huot | Joshua Maynez | Ji Ma | Sebastian Ruder | Kuzman Ganchev | Dipanjan Das | Mirella Lapata
Transactions of the Association for Computational Linguistics, Volume 11

The availability of large, high-quality datasets has been a major driver of recent progress in question answering (QA). Such annotated datasets, however, are difficult and costly to collect, and rarely exist in languages other than English, rendering QA technology inaccessible to underrepresented languages. An alternative to building large monolingual training datasets is to leverage pre-trained language models (PLMs) under a few-shot learning setting. Our approach, QAmeleon, uses a PLM to automatically generate multilingual data upon which QA models are fine-tuned, thus avoiding costly annotation. Prompt tuning the PLM with only five examples per language delivers accuracy superior to translation-based baselines; it bridges nearly 60% of the gap between an English-only baseline and a fully-supervised upper bound fine-tuned on almost 50,000 hand-labeled examples; and consistently leads to improvements compared to directly fine-tuning a QA model on labeled examples in low resource settings. Experiments on the TyDiqa-GoldP and MLQA benchmarks show that few-shot prompt tuning for data synthesis scales across languages and is a viable alternative to large-scale annotation.1

2022

A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation
Shashi Narayan | Gonçalo Simões | Yao Zhao | Joshua Maynez | Dipanjan Das | Michael Collins | Mirella Lapata
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose Composition Sampling, a simple but effective method to generate diverse outputs for conditional generation of higher quality compared to previous stochastic decoding strategies. It builds on recently proposed plan-based neural generation models (FROST, Narayan et al, 2021) that are trained to first create a composition of the output and then generate by conditioning on it and the input. Our approach avoids text degeneration by first sampling a composition in the form of an entity chain and then using beam search to generate the best possible text grounded to this entity chain. Experiments on summarization (CNN/DailyMail and XSum) and question generation (SQuAD), using existing and newly proposed automaticmetrics together with human-based evaluation, demonstrate that Composition Sampling is currently the best available decoding strategy for generating diverse meaningful outputs.

2021

Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features
Hannah Rashkin | David Reitter | Gaurav Singh Tomar | Dipanjan Das
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Knowledge-grounded dialogue systems are intended to convey information that is based on evidence provided in a given source text. We discuss the challenges of training a generative neural dialogue model for such systems that is controlled to stay faithful to the evidence. Existing datasets contain a mix of conversational responses that are faithful to selected evidence as well as more subjective or chit-chat style responses. We propose different evaluation measures to disentangle these different styles of responses by quantifying the informativeness and objectivity. At training time, additional inputs based on these evaluation measures are given to the dialogue model. At generation time, these additional inputs act as stylistic controls that encourage the model to generate responses that are faithful to the provided evidence. We also investigate the usage of additional controls at decoding time using resampling techniques. In addition to automatic metrics, we perform a human evaluation study where raters judge the output of these controlled generation models to be generally more objective and faithful to the evidence compared to baseline dialogue systems.

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann | Tosin Adewumi | Karmanya Aggarwal | Pawan Sasanka Ammanamanchi | Anuoluwapo Aremu | Antoine Bosselut | Khyathi Raghavi Chandu | Miruna-Adriana Clinciu | Dipanjan Das | Kaustubh Dhole | Wanyu Du | Esin Durmus | Ondřej Dušek | Chris Chinenye Emezue | Varun Gangal | Cristina Garbacea | Tatsunori Hashimoto | Yufang Hou | Yacine Jernite | Harsh Jhamtani | Yangfeng Ji | Shailza Jolly | Mihir Kale | Dhruv Kumar | Faisal Ladhak | Aman Madaan | Mounica Maddela | Khyati Mahajan | Saad Mahamood | Bodhisattwa Prasad Majumder | Pedro Henrique Martins | Angelina McMillan-Major | Simon Mille | Emiel van Miltenburg | Moin Nadeem | Shashi Narayan | Vitaly Nikolaev | Andre Niyongabo Rubungo | Salomey Osei | Ankur Parikh | Laura Perez-Beltrachini | Niranjan Ramesh Rao | Vikas Raunak | Juan Diego Rodriguez | Sashank Santhanam | João Sedoc | Thibault Sellam | Samira Shaikh | Anastasia Shimorina | Marco Antonio Sobrevilla Cabezudo | Hendrik Strobelt | Nishant Subramani | Wei Xu | Diyi Yang | Akhila Yerukola | Jiawei Zhou
Proceedings of the First Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.

Decontextualization: Making Sentences Stand-Alone
Eunsol Choi | Jennimaria Palomaki | Matthew Lamm | Tom Kwiatkowski | Dipanjan Das | Michael Collins
Transactions of the Association for Computational Linguistics, Volume 9

Models for question answering, dialogue agents, and summarization often interpret the meaning of a sentence in a rich context and use that meaning in a new context. Taking excerpts of text can be problematic, as key pieces may not be explicit in a local window. We isolate and define the problem of sentence decontextualization: taking a sentence together with its context and rewriting it to be interpretable out of context, while preserving its meaning. We describe an annotation procedure, collect data on the Wikipedia corpus, and use the data to train models to automatically decontextualize sentences. We present preliminary studies that show the value of sentence decontextualization in a user-facing task, and as preprocessing for systems that perform document understanding. We argue that decontextualization is an important subtask in many downstream applications, and that the definitions and resources provided can benefit tasks that operate on sentences that occur in a richer context.

2020

Syntactic Data Augmentation Increases Robustness to Inference Heuristics
Junghyun Min | R. Thomas McCoy | Dipanjan Das | Emily Pitler | Tal Linzen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Pretrained neural models such as BERT, when fine-tuned to perform natural language inference (NLI), often show high accuracy on standard datasets, but display a surprising lack of sensitivity to word order on controlled challenge sets. We hypothesize that this issue is not primarily caused by the pretrained model’s limitations, but rather by the paucity of crowdsourced NLI examples that might convey the importance of syntactic structure at the fine-tuning stage. We explore several methods to augment standard training sets with syntactically informative examples, generated by applying syntactic transformations to sentences from the MNLI corpus. The best-performing augmentation method, subject/object inversion, improved BERT’s accuracy on controlled examples that diagnose sensitivity to word order from 0.28 to 0.73, without affecting performance on the MNLI test set. This improvement generalized beyond the particular construction used for data augmentation, suggesting that augmentation causes BERT to recruit abstract syntactic representations.

BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam | Dipanjan Das | Ankur Parikh
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Text generation has made significant advances in the last few years. Yet, evaluation metrics have lagged behind, as the most popular choices (e.g., BLEU and ROUGE) may correlate poorly with human judgment. We propose BLEURT, a learned evaluation metric for English based on BERT. BLEURT can model human judgment with a few thousand possibly biased training examples. A key aspect of our approach is a novel pre-training scheme that uses millions of synthetic examples to help the model generalize. BLEURT provides state-of-the-art results on the last three years of the WMT Metrics shared task and the WebNLG data set. In contrast to a vanilla BERT-based approach, it yields superior results even when the training data is scarce and out-of-distribution.

ToTTo: A Controlled Table-To-Text Generation Dataset
Ankur Parikh | Xuezhi Wang | Sebastian Gehrmann | Manaal Faruqui | Bhuwan Dhingra | Diyi Yang | Dipanjan Das
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia. We present systematic analyses of our dataset and annotation process as well as results achieved by several state-of-the-art baselines. While usually fluent, existing methods often hallucinate phrases that are not supported by the table, suggesting that this dataset can serve as a useful research benchmark for high-precision conditional text generation.

Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task
Thibault Sellam | Amy Pu | Hyung Won Chung | Sebastian Gehrmann | Qijun Tan | Markus Freitag | Dipanjan Das | Ankur Parikh
Proceedings of the Fifth Conference on Machine Translation

The quality of machine translation systems has dramatically improved over the last decade, and as a result, evaluation has become an increasingly challenging problem. This paper describes our contribution to the WMT 2020 Metrics Shared Task, the main benchmark for automatic evaluation of translation. We make several submissions based on BLEURT, a previously published which uses transfer learning. We extend the metric beyond English and evaluate it on 14 language pairs for which fine-tuning data is available, as well as 4 “zero-shot” language pairs, for which we have no labelled examples. Additionally, we focus on English to German and demonstrate how to combine BLEURT’s predictions with those of YiSi and use alternative reference translations to enhance the performance. Empirical results show that the models achieve competitive results on the WMT Metrics 2019 Shared Task, indicating their promise for the 2020 edition.

2019

Text Generation with Exemplar-based Adaptive Decoding
Hao Peng | Ankur Parikh | Manaal Faruqui | Bhuwan Dhingra | Dipanjan Das
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We propose a novel conditioned text generation model. It draws inspiration from traditional template-based text generation techniques, where the source provides the content (i.e., what to say), and the template influences how to say it. Building on the successful encoder-decoder paradigm, it first encodes the content representation from the given input text; to produce the output, it retrieves exemplar text from the training data as “soft templates,” which are then used to construct an exemplar-specific decoder. We evaluate the proposed model on abstractive text summarization and data-to-text generation. Empirical results show that this model achieves strong performance and outperforms comparable baselines.

BERT Rediscovers the Classical NLP Pipeline
Ian Tenney | Dipanjan Das | Ellie Pavlick
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Pre-trained text encoders have rapidly advanced the state of the art on many NLP tasks. We focus on one such model, BERT, and aim to quantify where linguistic information is captured within the network. We find that the model represents the steps of the traditional NLP pipeline in an interpretable and localizable way, and that the regions responsible for each step appear in the expected sequence: POS tagging, parsing, NER, semantic roles, then coreference. Qualitative analysis reveals that the model can and often does adjust this pipeline dynamically, revising lower-level decisions on the basis of disambiguating information from higher-level representations.

Handling Divergent Reference Texts when Evaluating Table-to-Text Generation
Bhuwan Dhingra | Manaal Faruqui | Ankur Parikh | Ming-Wei Chang | Dipanjan Das | William Cohen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatically constructed datasets for generating text from semi-structured data (tables), such as WikiBio, often contain reference texts that diverge from the information in the corresponding semi-structured data. We show that metrics which rely solely on the reference texts, such as BLEU and ROUGE, show poor correlation with human judgments when those references diverge. We propose a new metric, PARENT, which aligns n-grams from the reference and generated texts to the semi-structured data before computing their precision and recall. Through a large scale human evaluation study of table-to-text models for WikiBio, we show that PARENT correlates with human judgments better than existing text generation metrics. We also adapt and evaluate the information extraction based evaluation proposed by Wiseman et al (2017), and show that PARENT has comparable correlation to it, while being easier to use. We show that PARENT is also applicable when the reference texts are elicited from humans using the data from the WebNLG challenge.

2018

WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse
Manaal Faruqui | Ellie Pavlick | Ian Tenney | Dipanjan Das
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We release a corpus of 43 million atomic edits across 8 languages. These edits are mined from Wikipedia edit history and consist of instances in which a human editor has inserted a single contiguous phrase into, or deleted a single contiguous phrase from, an existing sentence. We use the collected data to show that the language generated during editing differs from the language that we observe in standard corpora, and that models trained on edits encode different aspects of semantics and discourse than models trained on raw text. We release the full corpus as a resource to aid ongoing research in semantics, discourse, and representation learning.

Learning To Split and Rephrase From Wikipedia Edit History
Jan A. Botha | Manaal Faruqui | John Alex | Jason Baldridge | Dipanjan Das
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia’s edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task. Incorporating WikiSplit as training data produces a model with qualitatively better predictions that score 32 BLEU points above the prior best result on the WebSplit benchmark.

Identifying Well-formed Natural Language Questions
Manaal Faruqui | Dipanjan Das
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Understanding search queries is a hard problem as it involves dealing with “word salad” text ubiquitously issued by users. However, if a query resembles a well-formed question, a natural language processing pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors. Hence, identifying whether or not a query is well formed can enhance query understanding. Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-wellformed categories and report an accuracy of 70.7% on the test set. We also show that our classifier can be used to improve the performance of neural sequence-to-sequence models for generating questions for reading comprehension.

2017

Neural Paraphrase Identification of Questions with Noisy Pretraining
Gaurav Singh Tomar | Thyago Duque | Oscar Täckström | Jakob Uszkoreit | Dipanjan Das
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We present a solution to the problem of paraphrase identification of questions. We focus on a recent dataset of question pairs annotated with binary paraphrase labels and show that a variant of the decomposable attention model (replacing the word embeddings of the decomposable attention model of Parikh et al. 2016 with character n-gram representations) results in accurate performance on this task, while being far simpler than many competing neural architectures. Furthermore, when the model is pretrained on a noisy dataset of automatically collected question paraphrases, it obtains the best reported performance on the dataset.

2016

A Decomposable Attention Model for Natural Language Inference
Ankur Parikh | Oscar Täckström | Dipanjan Das | Jakob Uszkoreit
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Transforming Dependency Structures to Logical Forms for Semantic Parsing
Siva Reddy | Oscar Täckström | Michael Collins | Tom Kwiatkowski | Dipanjan Das | Mark Steedman | Mirella Lapata
Transactions of the Association for Computational Linguistics, Volume 4

The strongly typed syntax of grammar formalisms such as CCG, TAG, LFG and HPSG offers a synchronous framework for deriving syntactic structures and semantic logical forms. In contrast—partly due to the lack of a strong type system—dependency structures are easy to annotate and have become a widely used form of syntactic analysis for many languages. However, the lack of a type system makes a formal mechanism for deriving logical forms from dependency structures challenging. We address this by introducing a robust system based on the lambda calculus for deriving neo-Davidsonian logical forms from dependency trees. These logical forms are then used for semantic parsing of natural language to Freebase. Experiments on the Free917 and Web-Questions datasets show that our representation is superior to the original dependency trees and that it outperforms a CCG-based representation on this task. Compared to prior work, we obtain the strongest result to date on Free917 and competitive results on WebQuestions.

Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP
Dipanjan Das | Chris Dyer | Manaal Faruqui | Yulia Tsvetkov
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP

2015

Semantic Role Labeling with Neural Network Factors
Nicholas FitzGerald | Oscar Täckström | Kuzman Ganchev | Dipanjan Das
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Efficient Inference and Structured Learning for Semantic Role Labeling
Oscar Täckström | Kuzman Ganchev | Dipanjan Das
Transactions of the Association for Computational Linguistics, Volume 3

We present a dynamic programming algorithm for efficient constrained inference in semantic role labeling. The algorithm tractably captures a majority of the structural constraints examined by prior work in this area, which has resorted to either approximate methods or off-the-shelf integer linear programming solvers. In addition, it allows training a globally-normalized log-linear model with respect to constrained conditional likelihood. We show that the dynamic program is several times faster than an off-the-shelf integer linear programming solver, while reaching the same solution. Furthermore, we show that our structured model results in significant improvements over its local counterpart, achieving state-of-the-art results on both PropBank- and FrameNet-annotated corpora.

2014

Learning Compact Lexicons for CCG Semantic Parsing
Yoav Artzi | Dipanjan Das | Slav Petrov
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Frame-Semantic Parsing
Dipanjan Das | Desai Chen | André F. T. Martins | Nathan Schneider | Noah A. Smith
Computational Linguistics, Volume 40, Issue 1 - March 2014

Semantic Frame Identification with Distributed Word Representations
Karl Moritz Hermann | Dipanjan Das | Jason Weston | Kuzman Ganchev
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Enhanced Search with Wildcards and Morphological Inflections in the Google Books Ngram Viewer
Jason Mann | David Zhang | Lu Yang | Dipanjan Das | Slav Petrov
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Statistical Models for Frame-Semantic Parsing
Dipanjan Das
Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014)

2013

Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
Kuzman Ganchev | Dipanjan Das
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

Universal Dependency Annotation for Multilingual Parsing
Ryan McDonald | Joakim Nivre | Yvonne Quirmbach-Brundage | Yoav Goldberg | Dipanjan Das | Kuzman Ganchev | Keith Hall | Slav Petrov | Hao Zhang | Oscar Täckström | Claudia Bedini | Núria Bertomeu Castelló | Jungmee Lee
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
Oscar Täckström | Dipanjan Das | Slav Petrov | Ryan McDonald | Joakim Nivre
Transactions of the Association for Computational Linguistics, Volume 1

We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages.

2012

A Universal Part-of-Speech Tagset
Slav Petrov | Dipanjan Das | Ryan McDonald
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via three experiments, that (1) compare tagging accuracies across languages, (2) present an unsupervised grammar induction approach that does not use gold standard part-of-speech tags, and (3) use the universal tags to transfer dependency parsers between languages, achieving state-of-the-art results.

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties
Dipanjan Das | Noah A. Smith
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

An Exact Dual Decomposition Algorithm for Shallow Semantic Parsing with Constraints
Dipanjan Das | André F. T. Martins | Noah A. Smith
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
Shay B. Cohen | Dipanjan Das | Noah A. Smith
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
Dipanjan Das | Slav Petrov
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Semi-Supervised Frame-Semantic Parsing for Unknown Predicates
Dipanjan Das | Noah A. Smith
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
Kevin Gimpel | Nathan Schneider | Brendan O’Connor | Dipanjan Das | Daniel Mills | Jacob Eisenstein | Michael Heilman | Dani Yogatama | Jeffrey Flanigan | Noah A. Smith
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

Movie Reviews and Revenues: An Experiment in Text Regression
Mahesh Joshi | Dipanjan Das | Kevin Gimpel | Noah A. Smith
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Probabilistic Frame-Semantic Parsing
Dipanjan Das | Nathan Schneider | Desai Chen | Noah A. Smith
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

SEMAFOR: Frame Argument Resolution with Log-Linear Models
Desai Chen | Nathan Schneider | Dipanjan Das | Noah A. Smith
Proceedings of the 5th International Workshop on Semantic Evaluation

Distributed Asynchronous Online Learning for Natural Language Processing
Kevin Gimpel | Dipanjan Das | Noah A. Smith
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

2009

Paraphrase Identification as Probabilistic Quasi-Synchronous Recognition
Dipanjan Das | Noah A. Smith
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

Non-textual Event Summarization by Applying Machine Learning to Template-based Language Generation
Mohit Kumar | Dipanjan Das | Sachin Agarwal | Alexander Rudnicky
Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009)

2008

Stacking Dependency Parsers
André F. T. Martins | Dipanjan Das | Noah A. Smith | Eric P. Xing
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

Automatic Extraction of Briefing Templates
Dipanjan Das | Mohit Kumar | Alexander I. Rudnicky
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

Co-authors

Oscar Täckström 7

Mirella Lapata 6

Michael Collins 5

Joshua Maynez 5

Shashi Narayan 5

Sebastian Gehrmann 4

Nathan Schneider 4

Thibault Sellam 4

Reinald Kim Amplayo 3

Bhuwan Dhingra 3

André F. T. Martins 3

Ryan McDonald 3

Vitaly Nikolaev 3

Gaurav Singh Tomar 3

Priyanka Agrawal 2

Chris Alberti 2

Tom Kwiatkowski 2

Ellie Pavlick 2

Hannah Rashkin 2

David Reitter 2

Alexander Rudnicky 2

Anders Sandholm 2

Jakob Uszkoreit 2

Tosin Adewumi 1

Sachin Agarwal 1

Karmanya Aggarwal 1

Pawan Sasanka Ammanamanchi 1

Anuoluwapo Aremu 1

Jason Baldridge 1

Claudia Bedini 1

Jonathan Berant 1

Núria Bertomeu 1

Antoine Bosselut 1

Khyathi Raghavi Chandu 1

Ming-Wei Chang 1

Hyung Won Chung 1

Elizabeth Clark 1

Miruna Clinciu 1

Shay B. Cohen 1

William Cohen 1

Kaustubh Dhole 1

Ondřej Dušek 1

Jacob Eisenstein 1

Chris Chinenye Emezue 1

Nicholas Fitzgerald 1

Jeffrey Flanigan 1

Markus Freitag 1

Cristina Garbacea 1

Yoav Goldberg 1

Tatsunori B. Hashimoto 1

Michael Heilman 1

Karl Moritz Hermann 1

Yacine Jernite 1

Harsh Jhamtani 1

Shailza Jolly 1

Faisal Ladhak 1

Annie Priyadarshini Louis 1

Mounica Maddela 1

Khyati Mahajan 1

Saad Mahamood 1

Bodhisattwa Prasad Majumder 1

Chaitanya Malaviya 1

Pedro Henrique Martins 1

R. Thomas McCoy 1

Angelina McMillan-Major 1

Daniel P. Mills 1

Brendan O’Connor 1

Jennimaria Palomaki 1

Laura Perez-Beltrachini 1

Yvonne Quirmbach-Brundage 1

Niranjan Ramesh Rao 1

Shruti Rijhwani 1

Juan Diego Rodriguez 1

Andre Niyongabo Rubungo 1

Sebastian Ruder 1

Sashank Santhanam 1

Samira Shaikh 1

Anastasia Shimorina 1

Aditya Siddhant 1

Gonçalo Simões 1

Marco Antonio Sobrevilla Cabezudo 1

Pranesh Srinivasan 1

Mark Steedman 1

Hendrik Strobelt 1

Nishant Subramani 1

Yulia Tsvetkov 1

Emiel Van Miltenburg 1

Kellie Webster 1

Akhila Yerukola 1

Dani Yogatama 1

Venues