SNaRe: Domain-aware Data Generation for Low-Resource Event Detection

Tanmay Parekh; Yuxuan Dong; Lucas Bandarkar; Artin Kim; I-Hung Hsu; Kai-Wei Chang; Nanyun Peng

doi:10.18653/v1/2025.emnlp-main.1039

SNaRe: Domain-aware Data Generation for Low-Resource Event Detection

Tanmay Parekh, Yuxuan Dong, Lucas Bandarkar, Artin Kim, I-Hung Hsu, Kai-Wei Chang, Nanyun Peng

Abstract

Event Detection (ED) – the task of identifying event mentions from natural language text – is critical for enabling reasoning in highly specialized domains such as biomedicine, law, and epidemiology. Data generation has proven to be effective in broadening its utility to wider applications without requiring expensive expert annotations. However, when existing generation approaches are applied to specialized domains, they struggle with label noise, where annotations are incorrect, and domain drift, characterized by a distributional mismatch between generated sentences and the target domain. To address these issues, we introduce SNaRe, a domain-aware synthetic data generation framework composed of three components: Scout, Narrator, and Refiner. Scout extracts triggers from unlabeled target domain data and curates a high-quality domain-specific trigger list using corpus-level statistics to mitigate domain drift. Narrator, conditioned on these triggers, generates high-quality domain-aligned sentences, and Refiner identifies additional event mentions, ensuring high annotation quality. Experimentation on three diverse domain ED datasets reveals how SNaRe outperforms the best baseline, achieving average F1 gains of 3-7% in the zero-shot/few-shot settings and 4-20% F1 improvement for multilingual generation. Analyzing the generated trigger hit rate and human evaluation substantiates SNaRe’s stronger annotation quality and reduced domain drift.

Anthology ID:: 2025.emnlp-main.1039
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20583–20604
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1039/
DOI:: 10.18653/v1/2025.emnlp-main.1039
Bibkey:
Cite (ACL):: Tanmay Parekh, Yuxuan Dong, Lucas Bandarkar, Artin Kim, I-Hung Hsu, Kai-Wei Chang, and Nanyun Peng. 2025. SNaRe: Domain-aware Data Generation for Low-Resource Event Detection. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 20583–20604, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: SNaRe: Domain-aware Data Generation for Low-Resource Event Detection (Parekh et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1039.pdf
Checklist:: 2025.emnlp-main.1039.checklist.pdf

PDF Cite Search Checklist Fix data