Debopam Das


2023

pdf bib
The RST Continuity Corpus
Debopam Das | Markus Egg
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)

We present the RST Continuity Corpus (RST-CC), a corpus of discourse relations annotated for continuity dimensions. Continuity or discontinuity (maintaining or shifting deictic centres across discourse segments) is an important property of discourse relations, but the two are correlated in greatly varying ways. To analyse this correlation, the relations in the RST-CC are annotated using operationalised versions of Givón’s (1993) continuity dimensions. We also report on the inter-annotator agreement, and discuss recurrent annotation issues. First results show substantial variation of continuity dimensions within and across relation types.

pdf bib
Testing the Continuity Hypothesis: A decompositional approach
Debopam Das | Markus Egg
Proceedings of the 4th Conference on Language, Data and Knowledge

2020

pdf bib
DiMLex-Bangla: A Lexicon of Bangla Discourse Connectives
Debopam Das | Manfred Stede | Soumya Sankar Ghosh | Lahari Chatterjee
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present DiMLex-Bangla, a newly developed lexicon of discourse connectives in Bangla. The lexicon, upon completion of its first version, contains 123 Bangla connective entries, which are primarily compiled from the linguistic literature and translation of English discourse connectives. The lexicon compilation is later augmented by adding more connectives from a currently developed corpus, called the Bangla RST Discourse Treebank (Das and Stede, 2018). DiMLex-Bangla provides information on syntactic categories of Bangla connectives, their discourse semantics and non-connective uses (if any). It uses the format of the German connective lexicon DiMLex (Stede and Umbach, 1998), which provides a cross-linguistically applicable XML schema. The resource is the first of its kind in Bangla, and is freely available for use in studies on discourse structure and computational applications.

2019

pdf bib
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
Amir Zeldes | Debopam Das | Erick Maziero Galani | Juliano Desiderato Antonio | Mikel Iruskieta
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

pdf bib
Introduction to Discourse Relation Parsing and Treebanking (DISRPT): 7th Workshop on Rhetorical Structure Theory and Related Formalisms
Amir Zeldes | Debopam Das | Erick Galani Maziero | Juliano Antonio | Mikel Iruskieta
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

This overview summarizes the main contributions of the accepted papers at the 2019 workshop on Discourse Relation Parsing and Treebanking (DISRPT 2019). Co-located with NAACL 2019 in Minneapolis, the workshop’s aim was to bring together researchers working on corpus-based and computational approaches to discourse relations. In addition to an invited talk, eighteen papers outlined below were presented, four of which were submitted as part of a shared task on elementary discourse unit segmentation and connective detection.

pdf bib
Nuclearity in RST and signals of coherence relations
Debopam Das
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

We investigate the relationship between the notion of nuclearity as proposed in Rhetorical Structure Theory (RST) and the signalling of coherence relations. RST relations are categorized as either mononuclear (comprising a nucleus and a satellite span) or multinuclear (comprising two or more nuclei spans). We examine how mononuclear relations (e.g., Antithesis, Condition) and multinuclear relations (e.g., Contrast, List) are indicated by relational signals, more particularly by discourse markers (e.g., because, however, if, therefore). We conduct a corpus study, examining the distribution of either type of relations in the RST Discourse Treebank (Carlson et al., 2002) and the distribution of discourse markers for those relations in the RST Signalling Corpus (Das et al., 2015). Our results show that discourse markers are used more often to signal multinuclear relations than mononuclear relations. The findings also suggest a complex relationship between the relation types and syntactic categories of discourse markers (subordinating and coordinating conjunctions).

pdf bib
Annotating Shallow Discourse Relations in Twitter Conversations
Tatjana Scheffler | Berfin Aktaş | Debopam Das | Manfred Stede
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

We introduce our pilot study applying PDTB-style annotation to Twitter conversations. Lexically grounded coherence annotation for Twitter threads will enable detailed investigations of the discourse structure of conversations on social media. Here, we present our corpus of 185 threads and annotation, including an inter-annotator agreement study. We discuss our observations as to how Twitter discourses differ from written news text wrt. discourse connectives and relations. We confirm our hypothesis that discourse relations in written social media conversations are expressed differently than in (news) text. We find that in Twitter, connective arguments frequently are not full syntactic clauses, and that a few general connectives expressing EXPANSION and CONTINGENCY make up the majority of the explicit relations in our data.

pdf bib
The DISRPT 2019 Shared Task on Elementary Discourse Unit Segmentation and Connective Detection
Amir Zeldes | Debopam Das | Erick Galani Maziero | Juliano Antonio | Mikel Iruskieta
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

In 2019, we organized the first iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms: the DISRPT Shared Task on Elementary Discourse Unit Segmentation and Connective Detection. In this paper we review the data included in the task, which cover 2.6 million manually annotated tokens from 15 datasets in 10 languages, survey and compare submitted systems and report on system performance on each task for both annotated and plain-tokenized versions of the data.

2018

pdf bib
Developing the Bangla RST Discourse Treebank
Debopam Das | Manfred Stede
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Constructing a Lexicon of English Discourse Connectives
Debopam Das | Tatjana Scheffler | Peter Bourgonje | Manfred Stede
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

We present a new lexicon of English discourse connectives called DiMLex-Eng, built by merging information from two annotated corpora and an additional list of relation signals from the literature. The format follows the German connective lexicon DiMLex, which provides a cross-linguistically applicable XML schema. DiMLex-Eng contains 149 English connectives, and gives information on syntactic categories, discourse semantics and non-connective uses (if any). We report on the development steps and discuss design decisions encountered in the lexicon expansion phase. The resource is freely available for use in studies of discourse structure and computational applications.

2017

pdf bib
The Good, the Bad, and the Disagreement: Complex ground truth in rhetorical structure analysis
Debopam Das | Manfred Stede | Maite Taboada
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms