Hindi TimeBank: An ISO-TimeML Annotated Reference Corpus

Pranav Goel, Suhan Prabhu, Alok Debnath, Priyank Modi, Manish Shrivastava


Abstract
ISO-TimeML is an international standard for multilingual event annotation, detection, categorization and linking. In this paper, we present the Hindi TimeBank, an ISO-TimeML annotated reference corpus for the detection and classification of events, states and time expressions, and the links between them. Based on contemporary developments in Hindi event recognition, we propose language independent and language-specific deviations from the ISO-TimeML guidelines, but preserve the schema. These deviations include the inclusion of annotator confidence, and an independent mechanism of identifying and annotating states such as copulars and existentials) With this paper, we present an open-source corpus, the Hindi TimeBank. The Hindi TimeBank is a 1,000 article dataset, with over 25,000 events, 3,500 states and 2,000 time expressions. We analyze the dataset in detail and provide a class-wise distribution of events, states and time expressions. Our guidelines and dataset are backed by high average inter-annotator agreement scores.
Anthology ID:
2020.isa-1.2
Volume:
Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation
Month:
May
Year:
2020
Address:
Marseille
Editor:
Harry Bunt
Venue:
ISA
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
13–21
Language:
English
URL:
https://aclanthology.org/2020.isa-1.2
DOI:
Bibkey:
Cite (ACL):
Pranav Goel, Suhan Prabhu, Alok Debnath, Priyank Modi, and Manish Shrivastava. 2020. Hindi TimeBank: An ISO-TimeML Annotated Reference Corpus. In Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation, pages 13–21, Marseille. European Language Resources Association.
Cite (Informal):
Hindi TimeBank: An ISO-TimeML Annotated Reference Corpus (Goel et al., ISA 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.isa-1.2.pdf