OmniEvent: A Comprehensive, Fair, and Easy-to-Use Toolkit for Event Understanding

Event understanding aims at understanding the content and relationship of events within texts, which covers multiple complicated information extraction tasks: event detection, event argument extraction, and event relation extraction. To facilitate related research and application, we present an event understanding toolkit OmniEvent, which features three desiderata: (1) Comprehensive. OmniEvent supports mainstream modeling paradigms of all the event understanding tasks and the processing of 15 widely-used English and Chinese datasets. (2) Fair. OmniEvent carefully handles the inconspicuous evaluation pitfalls reported in Peng et al. (2023), which ensures fair comparisons between different models. (3) Easy-to-use. OmniEvent is designed to be easily used by users with varying needs. We provide off-the-shelf models that can be directly deployed as web services. The modular framework also enables users to easily implement and evaluate new event understanding models with OmniEvent. The toolkit (https://github.com/THU-KEG/OmniEvent) is publicly released along with the demonstration website and video (https://omnievent.xlore.cn/).


Introduction
Correctly understanding events is fundamental for humans to understand the world.Event understanding requires identifying real-world events mentioned in texts and analyzing their relationships, which naturally benefits various downstream applications, such as stock prediction (Ding et al., 2015), adverse drug event detection (Wunnava et al., 2019), narrative event prediction (Wang et al., 2021a), and legal case analysis (Yao et al., 2022).
As illustrated in Figure 1, event understanding covers three complicated information extraction tasks: (1) event detection (ED), which is to detect the event triggers (keywords or phrases evoking events in texts) and classify their event types, (2) event argument extraction (EAE), which is to extract the event arguments for each trigger and classify their argument roles, and (3) event relation extraction (ERE), which is to identify the complex relationships between events, typically including temporal, causal, coreference, and subevent relations.ED and EAE together constitute the conventional event extraction (EE) task.
In recent years, event understanding research has grown rapidly (Ma et al., 2022;Wang et al., 2022;Yue et al., 2023;Huang et al., 2023), and multiple practical systems (Wadden et al., 2019;Lin et al., 2020;Zhang et al., 2020;Du et al., 2022;Zhang et al., 2022) have been developed.However, as shown in Table 1, existing systems exhibit several non-negligible issues: (1) Incomprehensive Tasks.Existing systems mainly focus on the two EE subtasks and rarely cover the whole event understanding pipeline with ERE tasks.The notable exception EventPlus (Ma et al., 2021) merely cov-ers the temporal relations.(2) Limited Support for Redevelopment and Evaluation.Most of the existing event understanding systems are highly integrated and not extensible, which means users cannot easily develop new models within their frameworks.Especially considering the recent rise of large language models (LLMs)3 , adequate support for LLMs is urgent but often missing.Moreover, the complicated data processing and evaluation details often lead to inconsistent and unfair evaluation results (Peng et al., 2023), but existing systems do not pay much attention to evaluations.
To address these issues, we develop OmniEvent, a comprehensive, fair, and easy-to-use toolkit for event understanding, which has three main features: (1) Comprehensive Support for Task, Model, and Dataset.OmniEvent supports endto-end event understanding from plain texts, i.e., all the ED, EAE, and ERE tasks.For ED and EAE, we classify the mainstream methods into four paradigms, including classification, sequence labeling, span prediction, and conditional generation.We implement various representative methods for each paradigm.For ERE, we provide a unified modeling framework and implement a basic pairwise classification method (Wang et al., 2022).We also cover the preprocessing of 15 widely-used English and Chinese datasets.(2) Fair Evaluation.As found in Peng et al. (2023), there are three major pitfalls hidden in EE evaluation, including data processing discrepancy, output space discrepancy, and absence of pipeline evaluation.OmniEvent implements all the proposed remedies to help users avoid them.Specifically, we implement unified pre-processing for all the datasets and a method to convert the predictions of different paradigms into a unified space.OmniEvent also provides unified prediction triggers of supported datasets for fair pipeline comparisons.(3) Easy-to-Use for Various Needs.We design a modular and extensible framework for OmniEvent, which appeals to users with various needs.We provide several off-theshelf models that can be easily deployed and used by users interested in applications.Model developers and researchers can train implemented methods within several lines of code or customize their own models and evaluate them.By integrating Transformers (Wolf et al., 2020) and DeepSpeed (Rasley et al., 2020), OmniEvent also supports efficiently Table 1: Comparisons between OmniEvent and other event understanding systems.The number of supported models and datasets only includes those of event understanding tasks.N/A denotes that the system is an integrated service and does not process benchmark datasets.
For OmniEvent, the module combination enables many possible models and 20 is the number of models we have tested for usability.
To demonstrate the effectiveness of OmniEvent, we present the results of several implemented methods on widely-used benchmarks.We also conduct experiments with models at different scales and show that fine-tuning LLMs helps achieve better event understanding results.We hope OmniEvent could facilitate the research and applications of event understanding.

Related Work
With the advancement of research in NLP, various toolkits or systems for event understanding have been developed.They tend to focus on developing advanced EE systems to achieve improved results on public benchmarks (Wadden et al., 2019;Lin et al., 2020;Nguyen et al., 2021) or perform robustly in real-world scenarios (Vossen et al., 2016;Du et al., 2022).However, these toolkits or systems, designed based on a specific EE model, do not support comprehensive implementations of EE models and are inconvenient for secondary development.There is also some work that has meticulously designed user-friendly algorithmic frameworks (Zhang et al., 2020(Zhang et al., , 2022)), which are convenient for usage and secondary development.However, they are not specifically designed for event understanding, hence the corresponding support is limited.EventPlus (Ma et al., 2021) is the only work supporting the entire event understanding pipeline but it only supports temporal relation extraction and does not provide comprehensive implementations of event understanding models.Moreover, existing work also neglects the discrepan- OmniEvent also supports large language models (T5-XXL (Raffel et al., 2020) and FLAN-UL2 (Tay et al., 2023)).
cies in EE evaluation as mentioned in Peng et al. (2023), which may result in unfair comparison.Finally, in the era of LLMs, existing work (except for DeepKE) also lacks support for LLMs.
Considering the mentioned issues, we present OmniEvent, a comprehensive, fair, and easy-to-use toolkit for event understanding.Compared to other systems in Table 1, OmniEvent supports the entire event understanding pipeline and comprehensively implements various models.OmniEvent also supports efficient fine-tuning and inference of LLMs.Meanwhile, OmniEvent provides respective remedies for eliminating the discrepancies as mentioned in Peng et al. (2023).With a modular implementation and several released off-the-shelf models, OmniEvent is user-friendly and easy to use.

The OmniEvent Toolkit
We introduce the overview ( § 3.1) and main features of OmniEvent ( § § 3.2 to 3.4), as well as an online demonstration ( § 3.5) powered by OmniEvent.

Overview
The overall architecture of OmniEvent is illustrated in Figure 2. OmniEvent provides a data pre-processing module for unified pre-precessing.Users can either use the supported datasets or customize their own datasets.After pre-processing, OmniEvent provides a flexible modular framework for model implementation.OmniEvent abstracts and disassembles the mainstream models into three basic modules and implements the basic modules in a highly encapsulated way.By combining our provided modules or implementing their own modules, users can easily assemble a model.OmniEvent reproduces several widely-used models in this way.Finally, OmniEvent provides a fair evaluation protocol to convert predictions of different models into a unified and comparable output space.

Comprehensive Support
OmniEvent implements the entire event understanding pipeline, i.e., all the ED, EAE, and ERE tasks, and can serve as a one-stop event understanding platform.Furthermore, OmniEvent provides comprehensive coverage of models and datasets.

Fair Evaluation
As discussed in Peng et al. (2023), there exist from OmniEvent.inferimport infer # input text text = "U.S. and British troops were moving on the strategic southern port city of Basra Saturday after a massive aerial assault pounded Baghdad at dawn" # event detection ed_results = infer(text=text, task="ED") # end-to-end event extraction ee_results = infer(text=text, task="EE") # end-to-end event understanding # event extraction & relation extraction all_results = infer(text=text, task="EE & ERE") Code 2: Example of using inference interface and offthe-shelf models for event understanding.
several pitfalls in EE evaluation that significantly influence the fair comparison of different models.They are in three aspects: data-preprocessing discrepancy, output space discrepancy, and absence of pipeline evaluation.OmniEvent proposes remedies for eliminating them.
Specify data pre-processing As the data preprocessing discrepancy mainly comes from using different processing options, OmniEvent provides all the widely-used data pre-processing scripts.Users only need to specify the pre-processing script for comparable results with previous studies.Peng et al. (2023), OmniEvent provides several easy-touse functions to convert the predictions of different models into a unified output space.Code 1 shows the conversion codes of sequence labeling, span prediction, and conditional generation predictions for event detection.Users can easily utilize the functions to obtain fair and comparable results.

Pipeline evaluation
The pipeline evaluation requires conducting EAE based on predicted triggers.Therefore, the results of EAE models are comparable only when using the same predicted triggers.OmniEvent provides a unified set of predicted triggers for widely-used datasets.Specifically, Om-niEvent leverages CLEVE (Wang et al., 2021b), an advanced ED model, to predict triggers for widelyused EE datasets: ACE 2005, KBP 2016, KBP 2017, and RichERE.

Easy-to-Use
OmniEvent is designed to be user-friendly and easy to use.Specifically, OmniEvent incorporates the following designs.
Easy start with off-the-shelf models Om-niEvent provides several off-the-shelf models for event understanding.Specifically, we train a multilingual T5 (Xue et al., 2021) for ED and EAE on the collection of included EE datasets, respectively.And we train a joint ERE model based on RoBERTa (Liu et al., 2019) on the training set of MAVEN-ERE.As shown in Code 2, OmniEvent provides an interface for inference and users can easily use these models in their applications with a few lines of code.
Modular implementation As shown in Figure 2, OmniEvent abstracts and disassembles the mainstream models into basic modules.The backbone module implements various text encoders, such as CNN (Krizhevsky et al., 2012) and BERT (Devlin et al., 2019), to encode plain texts into lowdimension dense vectors.The backbone module also supports LLMs such as T5-XXL (Raffel et al., 2020) and FLAN-UL2 (Tay et al., 2023).The aggregation module includes various aggregation operations, which aggregate and convert the dense vectors into representations of events, arguments, and relations.The classification module projects the representations into distributions of classification candidates.With the highly modular implementation, users can easily combine the basic modular components to develop new models.
Efficient support for LLMs OmniEvent is built upon Huggingface's Transformers (Wolf et al., 2020) and DeepSpeed (Rasley et al., 2020), an efficient deep learning optimization library.With the built-in DeepSpeed support, OmniEvent can be used to train and infer LLMs efficiently with only modifications of the startup shell scripts.

Online Demonstration
Besides the OmniEvent toolkit, we also develop an online demonstration system 4 powered by Om-niEvent.We train and deploy a multilingual T5 BASE model for EE and a RoBERTa BASE model for event relation extraction.The website example is shown in Figure 3.The online system supports EE based on various English and Chinese classification schemata and ERE based on the MAVEN-ERE schema.The website mainly contains three parts.The input part includes a text entry field and several options.Users can choose the language, task, and ontology (i.e., classification schema) for event understanding.The results of EE are shown in the output field with extracted triggers and arguments 4 https://omnievent.xlore.cn/U.S. and British troops were moving on the strategic southern port city of Basra on Sunday after a massive aerial assault pounded Baghdad at dawn.

Input
Options Output Arguments Relations  highlighted.The results of ERE are shown as an event knowledge graph, where a node is an event and an edge is an identified relation between events.The example in Figure 3 shows the results of endto-end event understanding (ED, EAE, and ERE) from the input plain text.

Evaluation
In this section, we conduct empirical experiments to evaluate the effectiveness of the OmniEvent toolkit on widely-used datasets.

Event Extraction
We evaluate the performance of representative EE models implemented in OmniEvent on various widely-used datasets.All the models are evaluated using the unified evaluation protocol, i.e., the  (Bagga and Baldwin, 1998).
output space is standardized and the results of EAE are from pipeline evaluation.The pre-processing script for ACE 2005 is the same as in Wadden et al. (2019).For EEQA, we utilize the same prompts as in the original paper for ACE 2005 and manually curate prompts for all the other datasets.The results of event detection and event argument extraction are shown in Table 3.The results demonstrate the effectiveness of OmniEvent, which achieves similar performance compared to their original implementations.OmniEvent provides all the experimental configuration files in the YAML format, which records all the hyper-parameters.Users can easily reproduce the results using the corresponding configuration files.

Event Relation Extraction
We also conduct empirical experiments to evaluate the performance of ERE models developed in OmniEvent on various widely-used datasets.As shown in Table 4, the results are on par or slightly better than the originally reported results in Wang et al. (2022), which demonstrates the validity of ERE models in OmniEvent.We also provide configuration files containing all the hyper-parameter settings for reproduction.

Experiments using LLMs
OmniEvent supports efficient fine-tuning and inference for LLMs.To examine the effectiveness and validity of LLMs support in OmniEvent and investigate the performance of models at different scales, we train a series of models on several datasets.Specifically, for ED and EAE, we fine-tune FLAN-  4, which may be due to the extremely long contexts and complex output space of the ERE task.We hope the findings based on OmniEvent can inspire future research on how to better leverage LLMs for event understanding.

Conclusion and Future Work
In the paper, we present OmniEvent, a comprehensive, fair, and easy-to-use toolkit for event understanding.With the comprehensive and modular implementation, OmniEvent can help researchers and developers conveniently develop and deploy models.OmniEvent also releases several off-theshelf models and deploys an online system for enhancing the applications of event understanding models.In the future, we will continually maintain OmniEvent to support more models and datasets.

Limitations
The major limitations of OmniEvent are threefold: (1) OmniEvent currently does not support document-level event extraction models and datasets, such as RAMS (Ebner et al., 2020) and WikiEvents (Li et al., 2021).OmniEvent also lacks support for a wider range of ERE models, such as constrained loss (Wang et al., 2020a) and ILP inference (Han et al., 2019).In the future, we will continue to maintain OmniEvent to support a broader range of models and datasets.
(2) OmniEvent currently only supports two languages, Chinese and English, and does not yet support event relation extraction in Chinese.This might constrain the widespread usage of the OmniEvent toolkit.In the future, OmniEvent will support more languages.
(3) Due to the limitations of training data and used models, the performance of our released model in practical applications is limited, especially in schemata and domains outside of the training data, such as the biomedical field.In the future, we will collect more training data and utilize advanced methods to develop more powerful and general models for event understanding.

Ethical Considerations
We will discuss the ethical considerations and broader impact of this work here: (1) Intellectual property.OmniEvent is open-sourced and released under MIT license5 .We adhere to the original licenses for all datasets and models used.
Regarding the issue of data copyright, we do not provide the original data and we only provide processing scripts for the original data.(2) Environmental Impact.The experiments are conducted on the Nvidia A100 GPUs and consume approximately 350 GPU hours.This results in a substantial amount of carbon emissions, which incurs a negative influence on our environment (Strubell et al., 2019).(3) Intended Use.OmniEvent can be utilized to provide event understanding services for users, and it can also serve as a toolkit to assist researchers in developing and evaluating models.(4) Misuse risks.OmniEvent should not be utilized for processing and analyzing sensitive or uncopyrighted data.The output of OmniEvent is determined by the input text and should not be used to support financial or political claims.

Figure 1 :
Figure 1: An illustration for the event understanding tasks, including event detection (ED), event argument extraction (EAE), and event relation extraction (ERE).

Figure 2 :
Figure2: Overview of the OmniEvent toolkit.OmniEvent can serve as a system offering event understanding services to users, while also serving as a toolkit for researchers in model development and evaluation.OmniEvent provides pre-processing scripts for widely-used datasets and converts the datasets into a unified data format.OmniEvent provides modular components and users can easily develop a new model based on the components.OmniEvent also supports large language models (T5-XXL(Raffel et al., 2020) and FLAN-UL2(Tay et al., 2023)).

Figure 3 :
Figure 3: Example of the online demonstration.We re-arrange the layout of the website for a compact presentation.Better visualization in color.

Table 2 :
Currently supported datasets in OmniEvent.Italics represent Chinese datasets.

Table 4 :
Experimental results (%) of the implemented pairwise-based ERE model in OmniEvent on various ERE datasets.The backbone is RoBERTa BASE .The evaluation metric for coreference is B-cubed