Benchmarking Models for Low-Resource Nepali Event Extraction with Trigger Phrase Identification and Event Classification

Sujal Maharjan; Astha Shrestha; Lakshmojee Koduru; Sweta Poudel; Shuvam Shiwakoti; Rabin Thapa; Kritesh Rauniyar; Surendrabikram Thapa

Benchmarking Models for Low-Resource Nepali Event Extraction with Trigger Phrase Identification and Event Classification

Sujal Maharjan, Astha Shrestha, Lakshmojee Koduru, Sweta Poudel, Shuvam Shiwakoti, Rabin Thapa, Kritesh Rauniyar, Surendrabikram Thapa

Abstract

Research on Event Extraction (EE) in South Asian languages is crucial for understanding information dissemination and enabling automated news analysis in morphologically complex, low-resource environments. To address the scarcity of high-quality, publicly available datasets, we present Nepali Event Extraction (NepEE), a manually annotated corpus comprising 10,226 Devanagari sentences. The dataset includes annotations for trigger spans and event types, achieving high inter-annotator agreement with Fleiss’ kappa = 0.812 for trigger identification and kappa = 0.855 for event classification. Our dataset was developed through a rigorous iterative three-phase protocol involving five expert native speakers to ensure linguistic precision. We conduct benchmarking across a broad spectrum of approaches, including classical feature-based models, five fine-tuned Transformer encoders, and contemporary instruction-tuned Large Language Models (LLMs) using zero-shot and fixed few-shot prompting. Our analysis shows that Indic-specialized Transformers achieve superior classification performance, while traditional methods and few-shot prompting struggle with the challenges of exact span extraction in morphologically complex contexts. Furthermore, we quantify performance differences between sentence-level and span-level tasks, providing strong baselines for future research. The findings and the released NepEE dataset provide a valuable resource for advancing event understanding in low-resource languages (LRLs). All code and resources are available at https://github.com/SUJAL390/EEUCA-ACL-2026-Trigger-Phrase-Identification-and-Event-Classification-in-Low-Resource-Languages.

Anthology ID:: 2026.eeuca-1.7
Volume:: Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ali Hürriyetoğlu, Surendrabikram Thapa, Hristo Tanev
Venues:: EEUCA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 58–71
Language:
URL:: https://aclanthology.org/2026.eeuca-1.7/
DOI:
Bibkey:
Cite (ACL):: Sujal Maharjan, Astha Shrestha, Lakshmojee Koduru, Sweta Poudel, Shuvam Shiwakoti, Rabin Thapa, Kritesh Rauniyar, and Surendrabikram Thapa. 2026. Benchmarking Models for Low-Resource Nepali Event Extraction with Trigger Phrase Identification and Event Classification. In Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026), pages 58–71, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Benchmarking Models for Low-Resource Nepali Event Extraction with Trigger Phrase Identification and Event Classification (Maharjan et al., EEUCA 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eeuca-1.7.pdf

PDF Cite Search Fix data