An Annotated Dataset of Discourse Modes in Hindi Stories

Swapnil Dhanwal; Hritwik Dutta; Hitesh Nankani; Nilay Shrivastava; Yaman Kumar; Junyi Jessy Li; Debanjan Mahata; Rakesh Gosangi; Haimin Zhang; Rajiv Shah; Amanda Stent

An Annotated Dataset of Discourse Modes in Hindi Stories

Swapnil Dhanwal, Hritwik Dutta, Hitesh Nankani, Nilay Shrivastava, Yaman Kumar, Junyi Jessy Li, Debanjan Mahata, Rakesh Gosangi, Haimin Zhang, Rajiv Ratn Shah, Amanda Stent

Abstract

In this paper, we present a new corpus consisting of sentences from Hindi short stories annotated for five different discourse modes argumentative, narrative, descriptive, dialogic and informative. We present a detailed account of the entire data collection and annotation processes. The annotations have a very high inter-annotator agreement (0.87 k-alpha). We analyze the data in terms of label distributions, part of speech tags, and sentence lengths. We characterize the performance of various classification algorithms on this dataset and perform ablation studies to understand the nature of the linguistic models suitable for capturing the nuances of the embedded discourse structures in the presented corpus.

Anthology ID:: 2020.lrec-1.149
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 1191–1196
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.149/
DOI:
Bibkey:
Cite (ACL):: Swapnil Dhanwal, Hritwik Dutta, Hitesh Nankani, Nilay Shrivastava, Yaman Kumar, Junyi Jessy Li, Debanjan Mahata, Rakesh Gosangi, Haimin Zhang, Rajiv Ratn Shah, and Amanda Stent. 2020. An Annotated Dataset of Discourse Modes in Hindi Stories. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1191–1196, Marseille, France. European Language Resources Association.
Cite (Informal):: An Annotated Dataset of Discourse Modes in Hindi Stories (Dhanwal et al., LREC 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.lrec-1.149.pdf

PDF Cite Search Fix data