Fine-grained Information Extraction from Biomedical Literature based on Knowledge-enriched Abstract Meaning Representation

Biomedical Information Extraction from scientific literature presents two unique and non-trivial challenges. First, compared with general natural language texts, sentences from scientific papers usually possess wider contexts between knowledge elements. Moreover, comprehending the fine-grained scientific entities and events urgently requires domain-specific background knowledge. In this paper, we propose a novel biomedical Information Extraction (IE) model to tackle these two challenges and extract scientific entities and events from English research papers. We perform Abstract Meaning Representation (AMR) to compress the wide context to uncover a clear semantic structure for each complex sentence. Besides, we construct the sentence-level knowledge graph from an external knowledge base and use it to enrich the AMR graph to improve the model’s understanding of complex scientific concepts. We use an edge-conditioned graph attention network to encode the knowledge-enriched AMR graph for biomedical IE tasks. Experiments on the GENIA 2011 dataset show that the AMR and external knowledge have contributed 1.8% and 3.0% absolute F-score gains respectively. In order to evaluate the impact of our approach on real-world problems that involve topic-specific fine-grained knowledge elements, we have also created a new ontology and annotated corpus for entity and event extraction for the COVID-19 scientific literature, which can serve as a new benchmark for the biomedical IE community.


Introduction
The task of Biomedical Information Extraction (IE) aims to extract structured knowledge from biomedical literature, which is usually represented by an information network composed of scientific named 1 Data and source code are publicly available at https: //github.com/zhangzx-uiuc/Knowledge-AMR. entities, relations, and key events. It is an essential task for accelerating practical applications of the results and achievements from scientific research. For example, practical progress on combating COVID-19 depends highly on efficient transmission, assessment and extension of cutting-edge scientific research discovery (Wang et al., 2020a;Lybarger et al., 2020;Möller et al., 2020). In this scenario, a powerful biomedical IE system will be able to create a dynamic knowledge base from the surging number of relevant papers, making it more efficient to get access to the latest knowledge and use it for scientific discovery, as well as diagnosis and treatment of patients.
IE from biomedical scientific papers presents two unique and non-trivial challenges. First, the authors of scientific papers tend to compose long sentences, where the event triggers and entity mentions are usually located far away from each other within the sentence. As shown in Table 1, we can see that compared to the ACE05 dataset in news domain, the average distance between triggers and entities is much longer in biomedical scientific papers. Therefore, it is more difficult for IE models to capture the global context with only flat sequential sentence encoders such as BioBERT (Lee et al., 2020) and SciBERT (Beltagy et al., 2019). Dataset Average distance Maximal distance ACE05-E 0.212 sentence 56 words GENIA-2011 0.330 sentence 77 words Table 1: Comparison of the average and maximum distance between each event-argument pair in news domain (ACE-05 dataset) and scientific papers (GENIA-2011 dataset) with the same sentence tokenizer.
Moreover, comprehending sentences from scientific papers urgently requires external knowledge, because there are a number of domain-specific un- explained common expressions, acronyms, and abbreviations that are difficult for the model to understand. For instance, as shown in Figure 1, it is nearly impossible for a typical end-to-end model, which only takes in the sentence as input, to get clear understanding of CTF, OTF-1, and OTF-2 without background knowledge. Moreover, the complex biomedical and chemical interactions between multifarious chemicals, genes, and proteins are even harder to understand in addition to the entities themselves.
To tackle these two challenges, we propose a novel framework for biomedical IE that integrates Abstract Meaning Representation (AMR) (Banarescu et al., 2013) and external knowledge graphs. AMR is a semantic representation language that converts the meaning of each input sentence into a rooted, directed, labeled, acyclic graph structure. AMR semantic representation includes PropBank (Palmer et al., 2005) frames, non-core semantic roles, coreference, entity typing and linking, modality, and negation. The nodes in AMR are concepts instead of words, and the edge types are much more fine-grained compared with traditional semantic languages like dependency parsing and semantic role labeling. We train a transformer-based AMR semantic parser (Fernandez Astudillo et al., 2020) on biomedical scientific texts and use it in our biomedical IE model. To better handle long sentences with distant trigger and entity pairs, we use AMR parsing to compress each sentence and to better capture global interactions between tokens. For example, as shown in Figure 1, the Positive Regulation event trigger "changes" is located far away from its arguments CTF, OTF-1, OTF-2 in the original sentence. However, in the AMR graph, such trigger-entity pairs are linked within two hops. Therefore, it will be much easier for the model to identify such kinds of events with the guidance of AMR parsing.
In addition, to make better use of the external knowledge, we extract a global knowledge graph from the Comparative Toxicogenomics Database (CTDB) that covers all biomedical entities in the corpus. For each sentence, we select a minimal connected subgraph as the sentence-level KG. We use this sentence KG to enrich AMR nodes and edges to give the model additional prior domain knowledge, especially the biomedical and chemical interactions between different genes and proteins. These fine-grained relations are important for biomedical event extraction. For example, as in Figure 1, the incorporation of the external KG can indicate that Mono Mac 6 can result in leukemia, which will affect the expression of CTF, OTF-1, and OFT-2 proteins. With this external knowledge, it will be much easier for the model to identify such proteins as the arguments of a Positive Regulation event. We encode the knowledge-enriched AMR graph using an edge-conditioned graph attention network (GAT) that is able to incorporate finegrained edge features before conducting IE tasks. We evaluate our model on the existing benchmark GENIA-2011 dataset where our model greatly outperforms our baseline model by 4.8%. In addition to the existing GENIA-2011 benchmark, we also aim to evaluate the effectiveness of our framework on topic-specific literature. We develop a new ontology for entities and events with a large corpus from COVID-19 research papers, which is specifically annotated by medical professionals and can serve as a new benchmark for the biomedical IE community.
The major contributions of this paper are summarized as follows.
• We are the first to enrich the AMR graph with the external knowledge and use a graph neural network to incorporate the fine-grained edge features.
• We evaluate our model and create a new state-of-the-art for biomedical event extraction on the GENIA-2011 corpus.
• We develop a new dataset from COVID-19 related research papers based on a new ontology that contains 25 fine-grained entity types and 14 event types.

Overview
As shown in Figure 2, our proposed biomedical information extraction framework mainly consists of four steps. First, we extract a global knowledge graph (KG) that contains all the entities from the corpus, and select out a sentence-level knowledge subgraph for the input sentence. Then, we perform AMR parsing and construct the sentencelevel AMR graph, and use the sentence knowledge subgraph to enrich the AMR graph by adding additional nodes and edges. After that, given the contextualized word embeddings, we first identify entity and trigger spans, and then conduct message passing on the knowledge enriched AMR graph based on an edge-conditioned GAT. Finally, we use feed-forward neural networks based classifiers for trigger and argument labeling.

Knowledge Graph Construction
Global Knowledge Graph We use the Comparative Toxicogenomics Database (CTDB) 2 which contains fine-grained biomedical and chemical interactions between chemicals, genes, and diseases. We construct a global knowledge graph that involves all entities from the corpus with their pairwise chemical interactions. We extract these entity pairs with their biomedical interactions as triples, e.g., in Figure 1, (Mono Mac 6, results, leukumia) indicates that Mono Mac 6 cell can result in the disease of leukemia. We merge all the extracted triples and form a global knowledge graph G g = (V g , E g ). Our extracted global KG consists of 39,436 nodes and 590,235 edges.

Sentence-level Knowledge Graph
Given an input sentence, we aim to generate a sentence-level KG by selecting out a subgraph from the global KG, which contains the external knowledge between all entities within the sentence. Given an input sentence S, we use SciSpacy 3 to obtain all the related biomedical entities, including genes, chemicals, cells, and proteins. We then link each entity mention from the sentence to the nodes in global KG G g = (V g , E g ). To select the sentence subgraph from the global KG, given the set of entity mentions E = {ε 1 , · · · , ε |E| } (where each ε i is a word span), we select the connected subgraph that covers all entity mentions in E with the minimal number of nodes as the sentence KG. Note that such a sentence KG construction procedure can be accomplished in linear time complexity in terms of the number of nodes |V g |. This can be done by first traversing all the nodes in the global KG using depth-first search and obtaining all connected subgraphs of G g in linear time. After that, we select the set of subgraphs that can cover E and then choose the one G s = (V s , E s ) with the minimal number of nodes as the sentence KG.

KG-enriched AMR parsing
AMR Parsing After obtaining the sentence KG, we fuse it with the AMR graph as an external knowledge enrichment procedure. Given an input sentence S = {w 1 , w 2 , · · · , w N }, we first perform AMR parsing and obtain a sentence-level AMR graph G A = (V A , E A ) with an alignment between AMR nodes and the spans in the original sentence. We employ the transformer-based AMR parser 4 (Fernandez Astudillo et al., 2020) pretrained on the Biomedical AMR corpus 5 released from the AMR official website. Each node v A i = (m A i , n A i ) ∈ V a represents an AMR concept or predicate, and we use (m A i , n A i ) to denote the corresponding span for such an AMR node. For AMR edges, we use e A i,j to denote the specific relation type between nodes v A i and v A j in AMR annotations (e.g., ARG-x, :time, :location, etc.). We randomly initialize the edge embeddings as a lookup embedding matrix E AMR , which is optimized in end-to-end training.
Enrich AMR with sentence KG Given a pair of AMR graph G A and sentence KG G S , we fuse them into an enriched AMR graph G = (V, E) as the external reference for the subsequent information extraction tasks. In general, there are three cases for fusing each sentence's KG nodes v s i ∈ V s into the AMR graph. First, if v s i represents an entity within the sentence, and there is also an AMR  Figure 2: Overview of our proposed framework for biomedical information extraction.
node v A j with the same span, we then match v s i to v A j and add all KG edges linked to v s i into the AMR graph. Second, if v s i represents an entity within the sentence, but there is not any AMR node v A j with a matched span, we then add a new node (as well as all related edges) into the AMR graph. Third, if v s i is an additional KG node that does not represent any entity in the sentence, we directly add this node into the AMR graph with all related KG edges. After we match and link all the sentence KG nodes towards the AMR graph, we obtain the fused graph G = (V, E). Note that such a graph fusion procedure could result in multiple edges between a pair of nodes. We keep all these edges with their embeddings for the subsequent message passing procedure. The illustration for the graph fusion procedure is shown in Figure 2.

Node Identification and Message Passing
Contextualized Encoder Given an input sentence S, we use the BERT model pretrained on biomedical scientific texts (Lee et al., 2020) to obtain the contextualized word representations {x 1 , x 2 , · · · , x N }. If one word is split into multiple pieces by the BERT tokenizer, we take the average of the representation vectors for all pieces as the final word representation.
Node Identification After encoding the input sentence using BERT, we first identify the entity and trigger spans as the candidate nodes. Similar to (Wadden et al., 2019), given the contextualized word representations, we first enumerate all possible spans up to a fixed length K, and calculate each span representation according to the concatenation of the left and right endpoints and a trainable fea-ture vector characterizing the span length 6 . Specifically, given each span s i = [start(i), end(i)], the span representation vector is: where z(s i ) denotes a trainable feature vector that is only determined by the span length. We use separate binary classifiers for each specific entity and trigger type to handle the spans with multiple labels. Each binary classifier is a feed-forward neural network with ReLU activation in the hidden layer, which is trained with binary cross-entropy loss jointly with the whole model. In the diagnostic setting of using gold-standard entity mentions, we only employ span enumeration for event trigger identification, and use the gold-standard entity set for the following event extraction steps.
Edge-conditioned GAT To fully exploit the information of external knowledge and AMR semantic structure, similar to (Zhang and Ji, 2021), we use an L-layer graph attention network to let the model aggregate neighbor information from the fused graph G = (V, E). We use h l i to denote the node feature for v i ∈ V in layer l, and e i,j to represent the edge feature vector for e i,j ∈ E. To update the node feature from l to l + 1, we first calculate the attention score for each neighbor j ∈ N i based on the concatenation of node features h l i , h l j and edge features e i,j . , 6 We use different maximum span length K for entity and trigger spans.
where W, W e are trainable parameters, and f l and σ(·) are a single layer feed-forward neural network and LeakyReLU activation function respectively. Then we obtain the neighborhood information h * i by the weighted sum of all neighbor features: where W * is a trainable parameter. The updated node feature is calculated by a combination of the original node feature and its neighborhood information, where γ controls the level of message passing between neighbors.
Note that our edge-conditioned GAT structure is similar to (Huang et al., 2020). The main difference is that (Huang et al., 2020) only uses edge features for calculating the attention score α l i,j , while we use the concatenation of the feature vectors of each edge and its involved pair of nodes. Such a method can better characterize differing importance levels for neighbor nodes, and thus yield better model performance. We select the last layer h L i as the final representation for each entity or trigger.
Message Passing Given the knowledge enriched AMR graph G = (V, E) and representation vectors of extracted trigger and entity spans, we initialize the feature vectors for nodes and edges as follows. For each KG node v s i which does not belong to any AMR node, we initialize its feature vectors v s i using KG embeddings pre-trained on the global KG using TransE (Bordes et al., 2013). For each original AMR node v A i = (m A i , n A i ), we first calculate its span representation v A i according to Eq. (1), and then use a linear transformation W A v A i + b A to initialize the node feature vector h 0 i . For edge features, we use pre-trained TransE embeddings for KG edges, and use the trainable embedding matrix E AMR for AMR relations. We use our proposed edge-conditioned GAT to conduct message passing and get the feature vectors from the final layer as the updated node representations. We obtain the final representation vectors for the trigger and entity nodes and denote them as {τ 1 , · · · , τ |T | } and {ε 1 , · · · , ε |E| } respectively.

Biomedical Event Extraction
Model Training Given the event trigger set T with the event trigger representations τ i , and the entity set E with the representations ε i , we use L I to denote the loss for binary classifiers for event trigger and entity extraction in the node identification step. For event argument role labeling, we concatenate candidate trigger-entity pairs or triggertrigger pairs (for nested events) and feed them into two separate FFNs (with softmax activation function in the output layer) for role type classification, where we have y tt i,j = FFN tt ([τ i : τ j ]) or y te i,j = FFN te ([τ i : ε j ]). The overall training objective is defined in a multi-task setting, which includes the cross-entropy loss for trigger and argument classification, as well as the binary classification loss L I .

Baselines and Ablation Variants
We consider the most recent models on biomedical event extraction: KB-Tree-LSTM (Li et al., 2019), GEANet (Huang et al., 2020), BEESL (Ramponi et al., 2020), and DeepEventMine (Trieu et al., 2020) for comparison in our experiments, and we report the precision, recall, and F1 score from the GENIA 2011 online test set evaluation service 7 . In addition to the previous models, we also conduct ablation studies to evaluate the contributions of different parts in our model. We adopt the model variants BERT-Flat and BERT-AMR, where BERT-Flat only uses the BERT representations without any help from AMR and KG, and BERT-AMR denotes the model with an edge-conditioned GAT to encode the AMR graph without incorporating external knowledge.

Overall Performance
We report the performance of our model and compare it with the most recent biomedical IE models KB-Tree-LSTM (Li et al., 2019), GEANet (Huang et al., 2020), BEESL (Ramponi et al., 2020), and DeepEventMine (Trieu et al., 2020) in Table 3. In general, our KG enriched AMR model can achieve slightly higher performance compared with the state-of-the-art model DeepEventMine. Besides, our model greatly outperforms all other previous models for biomedical event extraction. To further measure the impact of each individual part in our model, we also introduce two model variants for the ablation study. We can see that compared with simply finetuning a flat BERT model, the AMR parsing contributes a 1.84% absolute gain on F1-Score, while the incorporation of external knowledge graph contributes 2.95%. We also report the overall development set F1 scores without using gold-standard entities, and compare the performance with BEESL in Table 4. We can discover that our model performs significantly better than the BEESL model, which proves that our model can better handle practical scenarios without goldstandard entities. Tree-LSTM (Li et al., 2019) 67.01 52.14 58.65 GEANet (Huang et al., 2020) 64.61 56.11 60.06 BEESL (Ramponi et al., 2020) Table 4: Overall dev F-score (%) of biomedical extraction on GENIA 2011 dataset without using goldstandard entities.

Case Study on COVID-19 Dataset
COVID-19 Dataset In order to evaluate the impact of our approach on real-world problems, besides the GENIA dataset, we also develop a new dataset specifically labeled by medical professionals from research papers related to COVID-19. We select out 186 full-text articles with 12,916 sentences from PubMed and PMC. Three experienced annotators who are biomedical domain experts have participated in the annotation, and the Cohen's Kappa scores for pairwise agreement between the annotators are 0.79, 0.84, and 0.74 respectively. The pre-defined entity and event type distributions in this dataset are shown in Table 6.

Results
We evaluate our proposed model by removing the event argument labeling procedure to accommodate a scenario limited to entity and event trigger labeling, that is, we remove the argument role classifiers FFN tt and FFN te while the overall training loss in Eq.
(3) only contains the first two terms for span identification and event trigger classification. As shown in Table 5, our model achieves 78.05% overall F1 score with 83.60% F1 on entity extraction task and 72.37% F1 on event extraction.
The entity extraction performance on the COVID dataset is lower than typical coarse-grained entity extraction model performance for BERT-like models on other datasets (e.g., our model can get around 86% F1 score for entity extraction on GENIA-2011 development set). This is probably because our proposed COVID-19 dataset is challenging with more find-grained biomedical entity and event types.

Qualitative Analysis
We select two typical examples in Table 7 to show how KG enriched AMR parsing helps to improve the performance of biomedical IE.
In the first example, we can see that the flat model fails to identify CAII as an entity of the bind event, which is probably due to the long distance between the trigger bind and the argument CAII (the model successfully detects the other two arguments V-erbA and C-erbA because they are much nearer). With the help of AMR parsing, the model successfully links CAII to the bind event since in the AMR graph, the three entities C-erbA, V-erbA, and CAII are located within the same number of hops from the bind trigger. But the model still cannot recognize CAII as the theme of transcription. This is probably because the model is not clear what whose refers to in the sentence. However, with the help of external knowledge, the model knows in advance that V-erbA could inhibit the transcription of CAII, thus it is able to identify CAII as the theme of the transcription event.
In the second example, the flat model is confused about which entity belongs to which event between two binding events in the same sentence. Here, the AMR parsing provides a clear tree structure and guides the model to correctly link the event-entity pairs (i.e., heterodimers with RAR beta, binding with VDR). However, the BERT-AMR model still fails to identify heterodimers as the theme of stimulated. With the further help of the external KG, the model knows in advance that RA can stimulate the generation of RAR beta heterodimers, and thus it is able to correctly identify a positive regulation between these two triggers.

Remaining Challenges
We compare the predictions from our model with the gold-standard annotations on the development set and discover the following typical remaining error cases.

Non-verb Event Triggers
Most of the biomedical events are triggered by verbs (bind, express, etc.) or their noun forms (binding, expression, etc.). However, there are also events triggered by adjectives (e.g., subsequent), proper nouns (e.g., mRNA, SiRNA), and even prepositions (e.g., from) and conjunctions (e.g., rather than). Our model misses a lot of these non-verb event triggers due to the insufficient training examples.

Misleading Verb Prefix
We also find that the prefix of a verb can sometimes be misleading for event trigger classification, especially for Negative Regulation events. Many Negative Regulation events are triggered by words with certain styles of prefix (in-or de-), e.g., inactivation, inactivated, decrease, degradation, etc., representing some negative interactions. As a result, the model mistakenly labels many other words with the same prefixes as Negative Regulation event triggers. For example, in the sentence: Dephosphorylation of 4E-BP1 was also observed ..., the word dephosphorylation should not be classified as a Negative Regulation event although it has a deprefix. Because dephosphorylation denotes an inverse chemical process of phosphorylation rather than negative regulation between different events or proteins. This is  probably because the BERT tokenizer breaks these words into pieces de, phosphorylation, encouraging BERT models to learn misleading patterns.

Related Work
Biomedical Information Extraction A number of previous studies contribute to biomedical event extraction with various techniques, such as dependency parsing (McClosky et al., 2011;Li et al., 2019), external knowledge base (Li et al., 2019;Huang et al., 2020), joint inference of triggers and arguments (Poon and Vanderwende, 2010;Ramponi et al., 2020), Abstract Meaning Representation (Rao et al., 2017), search based neural models (Espinosa et al., 2019), and multi-turn question answering (Wang et al., 2020b). Recently, to handle the nested biomedical events, BEESL (Ramponi et al., 2020) models biomedical event extraction as a unified sequence labeling problem for end-to-end training. DeepEventMine (Trieu et al., 2020) proposes to use a neural network based classifier to decide the structure of complex nested events. Our model is also in an end-to-end training pipeline, but additionally utilizes fine-grained AMR semantic parsing and external knowledge to improve the performance.

Utilization of External Knowledge
In terms of utilization of external knowledge, (Li et al., 2019) proposes a knowledge-driven Tree-LSTM framework to capture dependency structures and entity properties from an external knowledge base. More recently, GEANet (Huang et al., 2020) in-troduces a Graph Edge conditioned Attention Network (GEANet) that incorporates domain knowledge from the Unified Medical Language System (UMLS) into the IE framework. The main difference of our model is that we use fine-grained AMR parsing to compress the wide context, and manage to use an external KG to enrich the AMR to better incorporate domain knowledge. Incorporating external knowledge is also widely used in other tasks such as relation extraction (Chan and Roth, 2010;Cheng and Roth, 2013), and QA for domainspecific (science) questions (Pan et al., 2019).
Biomedical Benchmarks for COVID-19 (Lo et al., 2020) releases a dataset containing openaccess biomedical papers related to COVID-19. A lot of research has been done based on this dataset, including Information Retrieval (Wise et al., 2020), Entity Recognition (Wang et al., 2020b), distant supervision on fine-grained biomedical name entity recognition to support automatic information retrieval indexing or evidence mining (Wang et al., 2020c), and end-to-end Question Answering (QA) system for COVID-19 with domain adaptive synthetic QA training (Reddy et al., 2020). Our COVID-19 dataset will further advance the field in developing effective IE techniques specifically for the COVID-19 domain.

Conclusions and Future Work
In this paper, we propose a novel biomedical Information Extraction framework to effectively tackle two unique challenges for scientific domain IE: complex sentence structure and unexplained concepts. We utilize AMR parsing to compress wide contexts, and incorporate external knowledge into the AMR. Our proposed model produces significant performance gains compared with most stateof-the-art methods. In the future, we intend to exploit tables and figures in the scientific literature for multimedia representation. We also plan to further incorporate coreference graphs among sentences to further enrich contexts. We will also continue exploring the use of richer information from an external knowledge base to further improve the model's performance.