EventOA: An Event Ontology Alignment Benchmark Based on FrameNet and Wikidata

Event ontology provides a shared and formal specification about what happens in the real world and can benefit many natural language understanding tasks. However, the independent development of event ontologies often results in heterogeneous representations that raise the need for establishing alignments between semantically related events. There exists a series of works about ontology alignment (OA), but they only focus on the entity-based OA, and neglect the event-based OA. To fill the gap, we construct an Event Ontology Alignment ( EventOA ) dataset based on FrameNet and Wikidata, which consists of 900+ event type alignments and 8,000+ event argument alignments. Furthermore, we propose a multi-view event ontology alignment (MEOA) method, which utilizes description information (i.e., name, alias and definition) and neighbor information (i.e., subclass and superclass) to obtain richer representation of the event ontologies. Extensive experiments show that our MEOA outperforms the existing entity-based OA meth-ods and can serve as a strong baseline for Even-tOA research.


Introduction
Event ontology is crucial for understanding human behavior and has become a new paradigm for describing knowledge in the Semantic Web by providing a shared and formal specification about what happens in the real world (Brown et al., 2017).As shown in Figure 1, event Attack accurately describes the action in which someone attempts to injure another organism with many-sided arguments such as "assailant", "victim", "weapon", and so on.It has been recognized as useful for tasks like information extraction (Wimalasuriya and Dou, 2010), web service (Li and Yang, 2008) and automatic question answering (Lopez et al., 2011).Thus a remarkable number of event ontologies have been created such as FrameNet (Baker et al., 1998), Verb-Net (Kipper et al., 2007), Wikidata (Erxleben et al., 2014) and ACE (Doddington et al., 2004).However, the independent development of event ontologies often results in heterogeneous representations that hinder the knowledge integration.Driven by the ontology alignment evaluation initiative (OAEI)1 (Pour et al., 2022), many datasets (Bodenreider et al., 2005;Svátek and Berka, 2005;Karam et al., 2020) and methods (Jiménez-Ruiz et al., 2013;Faria et al., 2013;Iyer et al., 2021) have been proposed for ontology alignment (OA).However, almost all datasets and methods so far focus on entity ontologies, which are known for sharing knowledge about entities such as people, organizations and products (Bodenreider et al., 2005;Zamazal and Svátek, 2017).In contrast, event ontologies, which provide nexus for related entities/arguments with a higher semantic granularity, are more useful for language understanding tasks (Brown et al., 2017), but there is little attempt to tackle the problem of event-based OA.
To address the above issues, we take FrameNet (Baker et al., 1998) and Wikidata (Erxleben et al., 2014) as examples to explore the alignment between event ontologies.As illustrated in Figure 1, for Wikidata and FrameNet, OA systems need to establish correspondences between event types such as Assault vs. Attack, and correspondences between event arguments such as "armament" vs. "weapon".We choose FrameNet and Wikidata as the data sources for the following reasons.
First, establishing correspondences between FrameNet and Wikidata is meaningful as it will help to obtain an integrated event ontology with high coverage and quality.On one hand, Wikidata is a widely used world knowledge base contributed by the community, which has a large number of events but with a confusing hierarchy (Pellissier Tanon et al., 2020).On the other hand, FrameNet is an excellent repository of linguistic knowledge designed by linguists, which has a logically clean hierarchy but with limited events.Specifically, Wikidata contains 290K events that cover a wide range of domains, including disaster, sport, election, etc.However, the hierarchy in Wikidata is confusing as anyone can edit relations between events.For example, Writing is a subclass of Artistic_creation, while Carving is a subclass of Change.So a query for Artistic_creation would find the Writing but not Carving.In fact, both Writing and Carving are Artistic_creation activities.In contrast, FrameNet has an agreed-upon hierarchy that cannot be changed unless by the agreement of linguists, but FrameNet does not cover latest major events such as 2022_FIFA_World_Cup.Thus an ontology that reconciles the rigorous hierarchy of FrameNet with the rich events of Wikidata is valuable for applications in the Semantic Web.
Second, establishing correspondences between FrameNet and Wikidata is challenging due to the semantic diversity of lexemes described the event type and argument (i.e., polysemy and synonymy).(i) Polysemy, which refers to the phenomenon that ontologies use the same lexeme to describe events with different purposes, e.g., Motion in FrameNet describes the everyday events of "Agents change in position over time", while Motion_Q452237 in Wikidata describes "parliamentary motion" that happens throughout mankind history.Polysemy also occurs when the semantics of arguments vary from event type to event type (Li et al., 2006).As shown in Figure 1, argument "perpetrator" of event Assault corresponds to "assailant" of event Attack, while argument "perpetrator" of event Invasion corresponds to "invader" of event Invading.How to identify different semantics of the same lexeme is a challenging issue.(ii) Synonymy, which refers to the phenomenon that FrameNet and Wikidata use different lexemes to refer the same event types or arguments.As shown in Figure 1, Assault and Attack express the same event type with different lexemes, meanwhile "armament" and "weapon" express the same event argument with different lexemes.Thus it is critical to build complex correspondences that are semantics related but with different lexemes.
To this end, in this paper, we build an event ontology alignment dataset based on FrameNet and Wikidata.This dataset is named as Even-tOA and composed of two sub-datasets: event type alignment and event argument alignment.We extensively evaluate existing OA methods, but they are far from solving EventOA.Thus we propose a multi-view event ontology alignment (MEOA) method by utilizing multi-view information of event ontologies, which we believe would serve as a strong baseline for EventOA.We further propose a reasonable evaluation metrics for EventOA with type alignment and argument alignment.Our contributions are as follows: • We construct EventOA, a real world event ontology alignment dataset based on FrameNet and Wikidata, which consists of two subtasks, namely, event type alignment and event argument alignment.In addition, we devise evaluation metrics for the two subtasks to assess alignment quality.
• We propose a multi-view event ontology alignment (MEOA) method, which utilizes multiview information to model the representation of event ontology and thus can better resolve the semantic diversity problem.
• We conduct extensive evaluations of existing entity-based OA methods and our MEOA method.Experiment results show that our MEOA method outperforms the entity-based methods and achieves the SOTA performance, which can serve as a strong baseline for Even-tOA research.We also conduct a detailed error analysis to provide insights to future work.

Data Construction of EventOA
We construct our dataset in four stages: FrameNetbased ontology collection, Wikidata-based ontology collection, automatic alignment candidate selection and human annotation.

FrameNet-based Ontology Collection
FrameNet (Fillmore, 1976;Baker et al., 1998) is a linguistic resource constructed by linguists, which describes everyday events with agreed-upon inheritance relations.Thus we construct FrameNetbased ontology by collecting event types and arguments from FrameNet and building the hierarchy based on the inheritance relations.
In particular, Frame (Guan et al., 2021) is defined as schematic representation of a situation.Frame Elements (FEs) are frame-specific defined semantic roles.Lexical Units (LUs) are set of words grouped by their senses, and belong to a particular frame.Frames are linked by frame-toframe relations (F-to-F) such as "Inheritance" and "Subframe".And the relations between FEs are the same as the corresponding relations between frames.For instance, in Figure 2, FE "assailant" inherits from FE "agent" as frame Attack inherits from frame Intentionally_affect.
Based on the FrameNet inheritance relations, we construct FrameNet-based ontology, where the frame can be viewed as event type, the FE can be viewed as event argument, and F-to-F and FE-to-FE respectively reflect the relations among events and arguments.We build the RDFS schema for FrameNet according to FrameBase, which translates frame, frame element, F-to-F and FE-to-FE into RDFS counterparts (Rouces et al., 2015).
Thus we construct Wikidata-based ontology by processing the above confusions as follows: (1) Data acquisition.Inspired by Gottschalk and Demidova (2019), we run the SPARQL query in Figure 4 of Appendix B to select subclasses of Wikidata's "occurrence" as our event dataset.(2) Circle-path filtration.For an event in the circle, we only retain the path with the smallest depth to the root "occurrence".(3) Useless-events deletion.For each path, we discard the classes that have less than 10 direct instances and at the same time directly assert their children as subclasses of their parents for keeping the hierarchy.(4) Arguments completion.Given an event, we collect all its direct instances and use the union set of instance's properties as its arguments, as shown in Figure 1, arguments of event Assault (e.g., "victim" and "perpetrator") are obtained from its instances.

Automatic Alignment Candidate Selection
Given the two event ontologies (FrameNet and Wikidata), our goal is to identify correspondences between event type (frame and class) and event argument (FE and property)2 .To facilitate efficiency of annotation, we adopt some heuristic and automatic methods to select alignment candidates.
Event type candidate selection aims to select candidate frames in FrameNet for a given event class in Wikidata3 .We apply Frame-based and LUbased methods combined with Similarity-based method for event type candidate selection.
Frame-based method selects frames that are same as any forms of the event class in Wikidata as its candidates.LU-based method selects frames whose lexical units are same as any forms of the event class in Wikidata as its candidates.
Similarity-based method is used to amend the candidates number when the total number of candidates selected by Frame-and LU-based methods is less than 15.It selects candidates by computing similarity between class representation S c and frame representation S f .We use frame name F n and lexical unit F lu to build the representation S f (Guo et al., 2020).F lu representation is obtained by averaging the embedding of all LUs lu in a frame, i.e., F lu = 1 M M i=1 lu i .M is the total number of LUs of the frame.S c , F n and lu are the pre-trained Glove (Pennington et al., 2014).
Event argument candidate selection attempts to construct candidates for each property in Wikidata with FEs.We apply a Relation-aware Attention Mechanism for argument candidate selection.
For each property in Wikidata, we use the FEs under the corresponding frame as candidates and rank FEs by calculating similarity between property p and FE f e.Specifically, we integrate the nominal and relational perspectives of a FE for a more comprehensive representation as shown in Equation (1).F E n represents the nominal perspective, and W w=1 att(F E w ) • F E w represents the relational perspective.We utilize FE-to-FE relations to model FEs relational perspective with attention schema.Given a F E, F E + = {F E 1 , F E 2 , . . ., F E W } represents its expanded FEs, including all FEs that can be linked to F E through FE-to-FE relations.Note attention schemes have been designed to emphasize relevant FEs, avoiding the influence from less relevant but linked FEs.
where F E n is FE name representation, and W stands for the total number of FEs in F E + .We utilize the same method to obtain the p.

Human Annotation
We obtain candidates through above process for an event, but the semantic distinctions among candidates are subtle, so it is difficult to automatically select the best alignment to ensure the quality.
To create a gold-standard dataset of event ontology alignment, three graduate students who are familiar with Wikidata and FrameNet are invited to label the class with appropriate frame and label property with appropriate FE using our internal annotation platform, a screenshot of annotation interface is shown in Appendix E. They work independently and we adopt the majority vote for deciding the final correspondences (if disagreement appears).The mean inter-annotation agreement computed by Cohen's Kappa is 82.4%, indicating a high annotation quality.Examples of alignments are provided in Appendix A.

Multi-view Event Ontology Alignment (MEOA)
To solve the semantic diversity problems, we design MEOA, which establishes correspondences between event ontologies by utilizing multi-view information, as shown in Figure 3.

Multi-view Representation (MR)
MR aims to represent FrameNet and Wikidata from five different views, including name, alias, definition, subclass and superclass 4 .We choose these views as they can well describe the description (name, alias and definition) and neighbor (subclass and superclass) information for an event.
Denote the different meaningful views as P = {P 1 , P 2 , . . ., P i , . . ., P N }, and N is the number of views.P i = {P i1 , P i2 , . . ., P ij , . . ., P iM }, P ij is the j-th element of view P i .
For P ij , we feed its information P ij = {w ij1 , w ij2 , . . ., w ijk , . . ., w ijK } into the transformer-based encoder (Vaswani et al., 2017) to generate p ij . 4We model argument name representation with the help of its event type by directly summing up their name embeddings.
where w ijk represents the k-th word in the P ij , and K is the total number of words.

Multi-view Fusion (MF)
MF aims to integrate multi-view embeddings to get a more meaningful representation.
Intuitively, the combination of multi-view embeddings can strengthen the event representation.To model the multi-view information and interactions among different views, a multi-view Event Ontology Graph (EOG) is constructed.
EOG has five different kinds of nodes that correspond to the five views described in Section 3.1.There are two types of edges in EOG: Intra-view Edge: Nodes referring to the same view are connected with intra-view edges.In this way, the interaction among different nodes of the same view could be modeled.
Inter-view Edge: Different views are connected with inter-view edges if they belong to the same event, which can be further divided into Description edge and Neighbor edge.
Description Edge connects alias view and definition view to the name view.The rationale is that equivalent events tend to share similar or even the same notions.
Neighbor Edge connects superclass view and subclass view to the name view.The rationale is that equivalent events tend to be neighbored by equivalent events.
We apply Graph Convolution Network (GCN) (Kipf and Welling, 2017) on EOG to aggregate information.Formally, the hidden representation for each node at (l+1)th layer is computed by: where A = A + I is the adjacency matrix of the graph EOG with added self-connections, I is the identity matrix, D is the diagonal node degree matrix of A, ϕ(•) is ReLU function, and W l denotes learn-able parameters in l-th layer.

Alignment Prediction (AP)
AP aims to establish correspondences between semantically related events from different ontologies.
For FrameNet (E 1 ) and Wikidata (E 2 ), we define the correspondence between two events e 1 ∈ E 1 and e 2 ∈ E 2 as the three-element tuple, i.e., T =< e 1 , e 2 , s >, where (f, f e) ∈ e 1 , (c, p) ∈ e 2 , and s ∈ [0, 1] is a score indicating the degree to which e 1 and e 2 are equivalent.Event type (f ,e) and argument (f e,p) representation are obtained from MF module.We respectively compute the confidence score of type alignment S t and argument alignment S a , and use mean squared error as the loss to train our model inspired by Iyer et al. (2021).We take event type alignment as an example to elaborate the process. (5) where S t (•) denotes the confidence score of event type alignment, and G t (•) denotes the ground truth label which is 1 if f i ≡ c j and 0 otherwise.Note the process of argument alignment prediction is same as the process of type alignment, and the details can be found in Appendix C.

Experiments
This section provides experiment details, i.e., evaluation metrics, baselines, results, and their analysis.

Evaluation Metrics
Inspired by Faria et al. (2013) and Ji and Grishman (2008), we define two standards to determine the correctness of alignment (type and argument): • An event type alignment is correctly identified if it matches a reference event type alignment.
• An event argument alignment is correctly identified if the event type alignment and argument alignment match any of the reference argument alignments.
where M Out are the system's output alignments and M RA are reference (a.k.a.gold) alignments.

Data Splitting and Baseline Models
We consider two settings for data splitting: (i) the entire data is treated as test set, which is suitable for comparing unsupervised OA methods; (ii) the entire data is split into training, validation and test sets in 70:10:20 ratio, which can be used for evaluating supervised OA methods.We compare MEOA with various baselines: (i) unsupervised OA methods, namely AML (Faria et al., 2013), LogMap (Jiménez-Ruiz and Cuenca Grau, 2011;Jiménez-Ruiz et al., 2020) and Wiktionary (Portisch et al., 2019); (ii) supervised OA methods such as Word2Vec + classifier (He et al., 2022) and VeeAlign (Iyer et al., 2021).We choose these baselines based on their top performing and open-source availability.Note Word2Vec + classifier method concatenates embeddings of two event types or arguments and feeds them to a classifier trained on a training set to output an alignment score.For our unsupervised method MEOA-sum, we directly perform a sum operation rather than the GCN to unify different views into a single vector for each event ontology to calculate the alignment score.
Details about baselines and the implementation of our MEOA are provided in Appendix D.

Results and Discussion
We demonstrate the effectiveness of the proposed MEOA method and the challenges of EventOA.
Performance comparison of different methods on EventOA.

Ablation Study
We conduct ablation studies on event type alignment in Table 5.From the

Case Study
To show the effects of our MEOA model, Table 6 shows some cases of alignments that are correctly predicted by our MEOA but not by AML or LogMap.We can clearly see that, our MEOA method can resolve semantic diversity problems by capturing the implicit connection between ontologies.For instance, MEOA knows War corresponds to Hostile_encounter according to the alias information (i.e., War matches an alias in Hostile_encounter), as well as "Armament" and "Weapon" by utilizing arguments' definition and relation information.This demonstrates the strength of MEOA for modeling multi-view information to improve the performance of the alignment.

Error Analysis
Table 7 shows examples of error cases.Note that these cases are also incorrectly predicted by LogMap and AML.(1) Ambiguity, where distinctions between events can be relatively subtle.For instance, Medical_intervention differs from Cure mainly in the effect of the treatment, i.e., Medical_intervention deals only with attempts to alleviate a Medical_condition, whereas Cure deals with situations in which the Medical_condition has been cured.So it is difficult for a model to distinguish which event corresponds to event Treatment.
(2) Compound Word, where event types are formed with two or more words that make it difficult to derive accurate representations for them.For example, Deliberate_murder refers to Killing as "deliberate" is used to modify "murder".However, words in Surgical_operation are used together to take on a new meaning that refers to Medical_intervention.
(3) Spurious Correlation, where relations of arguments are too fraudulent for models to see through their spurious relationships and consequently resulting in poor generalization, e.g., "Vehicle" relates to "Speed" in many cases, so models learn this spurious correlation and cannot generalize to "Impact" where "Vehicle" refers to "Impactors".(4) Same Category, where models fail to discriminate semantics among arguments whose categories are same.For "Director", model outputs "Performer" when the gold argument is "Personnel" as both of them belong to people category, and models cannot further discriminate semantics between them.

Related Work
As this work involves datasets and methods about ontology alignment, we review key related works in these areas.
The OAEI has been the foremost venue for researchers focused on OA task, so we begin with a survey of datasets have been used in the OAEI.Anatomy is one of the longest running tracks in the OAEI, which consists of human and mouse anatomy ontologies from the biomedical domain and have been manually matched by medical experts (Bodenreider et al., 2005;Dragisic et al., 2017).Biodiv is particularly useful for biodiversity and ecology research (Karam et al., 2020).Conference is a collection of ontologies from the same domain of organizing conferences using complex definitions (Svátek and Berka, 2005).All of these datasets are about entity-based OA, but neglect the event-based OA.
Methods of OA can be classified into featurebased methods and deep learning based methods.Feature-based methods are typically based on lexical matching.Among these systems, Agree-mentMakerLight (AML) (Faria et al., 2013) and LogMap (Jiménez-Ruiz and Cuenca Grau, 2011) are two classic and leading systems in many OAEI tracks and other tasks (Kolyvakis et al., 2018).Wiktionary (Portisch et al., 2019) is another top performing system for multilingual OA.Recently, some works try to explore deep learning based OA methods.VeeAlign (Iyer et al., 2021) is one of the representative methods, which utilizes word embeddings to predict the alignment.
Although some OA datasets and methods have been investigated and developed, at present there are no well-established benchmarks for event ontology alignment.In this paper, we propose EventOA, an event ontology alignment dataset, which can be used for understanding the events and evaluating the performance of systems analyzing the real world events.

C Event Argument Alignment Prediction
We compute the confidence score of event argument alignment S a by taking similarity between f e in FrameNet and p in Wikidata.S a (f e i , p j ) = cos_sim(f e i , p j ) (10) We further use mean squared error as the loss to train our model (Iyer et al., 2021): Where S a (•) denotes the confidence score of event argument alignment, and G a (•) denotes the ground truth label which is 1 if f e i ≡ p j and 0 otherwise.

D Implementation Details and Baselines
We compare MEOA with various baselines.Specifically, AML (Faria et al., 2013) mixes various string-based matching methods to calculate matching scores.LogMap (Jiménez-Ruiz and Cuenca Grau, 2011;Jiménez-Ruiz et al., 2020) starts with a set of anchor mappings obtained from lexical comparison, then alternates between mapping repair and mapping discovery.Wiktionary (Portisch et al., 2019) is another top performing OA system that relies on the Wiktionary knowledge base.Word2Vec uses the vectors of their names and aliases to discover alignments.VeeAlign (Iyer et al., 2021) uses dual-attention mechanism to determine similarity between two ontologies.
We fine-tune MEOA for 6 epochs with a batch size of 48, and evaluated on the validation set for every 0.1 epoch, through which the best checkpoint is selected for prediction.The learning rate is set to 2e-5, while the loss functions used for EventOA is the mean squared error loss.The training uses a single NVIDIA GeForce RTX 3090 GPU.

Figure 2 :
Figure 2: An example of F-to-F and FE-to-FE.Solid lines represent F-to-F.Dash lines represent FE-to-FE.

Figure 4 :
Figure 4: The SPARQL query to obtain all subclasses of Wikidata's "occurrence".

Table 1 :
is a community effort where anybody can contribute facts, resulting in a confusing knowledge base including circle Statistics of the EventOA dataset.

Table 2 :
Comparison between entity and event OA datasets.Arg and Pro refer to argument and property, respectively.Note event includes type and argument, and entity includes class and property.

Table 1 ,
the FrameNet-based ontology contains 1,221 event types and 11,428 arguments, and the Wikidatabased ontology includes 12,159 event types and 257,498 arguments.By automated selection and human annotation, EventOA dataset contains 905 event type alignments and 8,650 event argument alignments, which is rich enough to promote the research of event ontology alignment.Comparison between entity and event OA datasets.We compare EventOA with existing widely-used EntityOA datasets in Table 2. From the table, we can observe that: (1) The size of FrameNet ontology and Wikidata ontology is significantly different.Concretely, the size of entity ontologies in each track have similar magnitudes (e.g., very little properties (e.g., 136 in ENVO), while our EventOA has a larger number of arguments (e.g., 257,489 in Wikidata).The reason is that arguments are defined specifically to each event type and thus lead to the diversity representation.
Table 3 presents the performance of our MEOA model on EventOA (including event type and event argument alignment) compared with top performer unsupervised/supervised entitybased OA methods.From the table, we can see that: (1) Our MEOA method achieves the highest F-measure on EventOA and significantly outperforms the baselines for t-test (p-value<0.05),

Table 3 :
Results on EventOA.Comparison of our proposed MEOA method with top performing unsupervised/supervised entity-based OA methods on event type and event argument alignment.

Table 4 :
Experimental results of MEOA on three entitybased datasets and our EventOA, indicating the challenge of EventOA and generalization of MEOA.

Table 5 :
Ablation study on event type alignment.

Table 6 :
Examples of alignments that are correctly predicted by MEOA.Arg refers to argument.