An Annotated Corpus for Realis Event Detection in Short Stories Written in English and Low Resource Assamese Language

Kirti Chaitanya, Choudhury Pankaj, An Ashish, Guha Prithwijit


Abstract
This paper presents an annotated corpora of Assamese and English short stories for event trigger detection. This marks a pioneering endeavor in short stories, contributing to developing resources for this genre, especially in the low-resource Assamese language. In the process, 200 short stories were manually annotated in both Assamese and English. The dataset was evaluated and several models were compared for predicting events that are actually happening, i.e., realis events. However, it is expensive to develop manually annotated language resources, especially when the text requires specialist knowledge to interpret. In this regard, TagIT, an automated event annotation tool, is introduced. TagIT is designed to facilitate our objective of expanding the dataset from 200 to 1,000. The best-performing model was employed in TagIT to automate the event annotation process. Extensive experiments were conducted to evaluate the quality of the expanded dataset. This study further illustrates how the combination of an automatic annotation tool and human-in-the-loop participation significantly reduces the time needed to generate a high-quality dataset.
Anthology ID:
2023.icon-1.8
Volume:
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2023
Address:
Goa University, Goa, India
Editors:
D. Pawar Jyoti, Lalitha Devi Sobha
Venue:
ICON
SIG:
SIGLEX
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
72–81
Language:
URL:
https://aclanthology.org/2023.icon-1.8
DOI:
Bibkey:
Cite (ACL):
Kirti Chaitanya, Choudhury Pankaj, An Ashish, and Guha Prithwijit. 2023. An Annotated Corpus for Realis Event Detection in Short Stories Written in English and Low Resource Assamese Language. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 72–81, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):
An Annotated Corpus for Realis Event Detection in Short Stories Written in English and Low Resource Assamese Language (Chaitanya et al., ICON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.icon-1.8.pdf