CONNER: A Cascade Count and Measurement Extraction Tool for Scientific Discourse

This paper presents our wining contribution to SemEval 2021 Task 8: MeasEval. The purpose of this task is identifying the counts and measurements from clinical scientific discourse, including quantities, entities, properties, qualifiers, units, modifiers, and their mutual relations. This task can be induced to a joint entity and relation extraction problem. Accordingly, we propose CONNER, a cascade count and measurement extraction tool that can identify entities and the corresponding relations in a two-step pipeline model. We provide a detailed description of the proposed model hereinafter. Furthermore, the impact of the essential modules and our in-process technical schemes are also investigated.


Introduction
Clinicians are currently coping with a massive amount of information, both from raw experimental data and scientific publications recording their results. However, the ever-expanding information sources have exceeded the ability of clinicians to digest and utilize them properly (Botsis et al., 2011;Cao et al., 2018;Zhou et al., 2010). Clinical information extraction tools De Bruijn et al., 2011;Li and Huang, 2016;Yehia et al., 2019;Mulyar and McInnes, 2020) in the text-mining field make an effort to alleviate the clinician's burden according to exploit how to better utilize the knowledge contained in scientific discourse, accessible in the form of natural human language. Automating the process of understanding the relevant parts of the scientific literature allows for effective searching, and enabling inference of new information and hypothesis generation for clinical research. * Corresponding author Counts and measurements are an important part of information source from the scientific discourse (Harper et al., 2021). However, extracting these count and measurement entities and their interactions is challenging, since the way scientists write them can be ambiguous and inconsistent, and the location of this information relative to the measurement can vary greatly.
In this paper, we focus on the SemEval 2021 MeasEval task which is composed of five sub-tasks that cover span extraction, classification, and relation extraction. As shown in Figure 1 it firstly demands to identify all the quantity spans given a paragraph from a scientific text. For each identified quantity, we need to extract the measured entity, measured property and qualifier which are corresponded to identified quantity. Besides, the relationships between quantity, measured entity, measured property and qualifier are also required to be identified. Lastly, the unit and type of the quantity is also needed to be recognized if they exist.
However, some elements are only present for the largest bed inventory of 13 kg e.g., Ti, Cr, and Mn.
However, all elements included within Fig. 5 can be considered to be at very low concentrations of <2ppm. To tackle this challenging task aforementioned, we propose CONNER, a cascade count and measurement extraction tool, of which it primarily contains four components: (1) Model encoder gains the representation of both scientific paragraph text and the entities. (2) Quantity tagger extracts all the potential quantity entities within the paragraph. (3) Relation-specific object tagger recognizes the possible measured entities, measured properties and qualifiers, as well as their mutual relations. (4) Unit and modifier extractor identifies units and modifier by both rule-based approach and a simple classifier.
The rest of the paper is organized as follows. In Section 2, we elaborate on the whole workflow of CONNER and a detailed analysis of our experiments and results in Section 3. The paper is summarised and concluded in Section 4.

Overview
The goal of CONNER is designed to identify all possible aspects of quantity items, including Quantity, MeasuredEntity, MeasuredProperty, Qualifier, Unit and Modifier, as well as their relations.
Inspired by , we assign a cascade framework that models relations as functions that map quantity to other objects in a sentence, which is contrary to the conventional prospective of relational triple extraction task (Gupta et al., 2016;Adel and Schütze, 2017;Zeng et al., 2018;Zheng et al., 2017) that firstly identity all the possible entities, then predict their relations. Based on our observation, there are roughly 90% entities overlapped in our tasks, which can be tackled smoothly by a cascade framework. Besides, our proposed method can just generate a limited amount of negative samples since there are only three types of relations in our tasks, which further prevent from the long-tail problem (Zhang et al., 2019;Li et al., 2020).
The basic idea of CONNER is a two-step pipeline model as shown in Figure 2. We firstly deploy a quantity tagger to identity all the possible Quantity from input paragraph in Section 2.2, for each predicted Quantity, we check all the potential relations to see if a relation can associate MeasuredEntity, MeasuredProperty and Qualifier with the Quantity in Section 2.4. In contrast of utilizing the proposed cascade framework, we adopt a rule-based method and a simple classifier in terms of extraction of unit and modifier in Section 2.5. We describe the detailed workflow below.

Model Encoder
The model encoder aims at gaining the semantic representations H of input paragraph text X, which will be further used in the following tagging module. In terms of the input paragraph that exceeds our pre-defined maximum length, we split them into pieces via full stop and encoder them separately. We experiment both ROBERTA (Liu et al., 2019) and BERT model (Devlin et al., 2018) to encode the context information. We adopt H = Encoder(X) for brevity, and L denotes the length of the input paragraph.

Quantity Tagger
The lower level tagging model shown in Figure 2 is designed to predict the entire potential quantities in the input paragraph, which is an ensemble model incorporating a CRF layer and a PointerNet Layer. We illustrate the whole workflow as follows.
PointerNet layer. Driving from the PointerNetwork (Vinyals et al., 2015), two identical binary classifiers are adopted to detect the start and end position of quantities respectively. Each token is fed into the binary classifiers to predict whether the current token is aligned to a start or end position of a quantity span. Formally, given a contextual representation h i ∈ H, we have: where b is the bias matrix and σ is the sigmoid activation function. p start i and p end i denotes the probability of identifying the i-th token in the input paragraph as the start or end position of a quantity, respectively. We set up a threshold score as 0.7, of which the current token will be assigned to 1 if its probability surpasses the threshold score, otherwise assigned to 0. The loss function of the quantity tagger is the following: where L denotes the length of the input paragraph, y start,end i is the ground truth label. In terms of multiple quantities appeared in the same paragraph, We adopt the same strategy as ) that we adopt the nearest start-end pair match principle to decide the span of any quantity based on the results.
CRF layer. In this layer, we consider quantity recognition as a sequence-labeling problem. We select BIOS(Beginning, Inside, Outside, Single) as our label schema. Accordingly, given the representation sequence H = (h 1 , h 2 , ..., h L )we adapt a probability-based sequence detection conditional random field (CRF) model (Zheng et al., 2015),  which defines the conditional probability distribution P (Y |H) of label sequence Y given contextual word representation H aforementioned. We maximize the log-probability during training. In decoding, we set transition costs as infinite if it is invalid. The expected label sequence Y = (y 1 , y 2 , ..., y L ) is predicted based on maximum scores in the decoding.
Ensemble. The experimental results are relatively comparable in terms of CRF layer and point-erNet layer (See Table 2.3). However, an empirical observation on the predicted results suggest that CRF layer tends to extract shorter spans, while PointerNet layer does the opposite. Presumably, the ensemble model can gain better results since the distribution of entities in the dataset exists both long and short spans. We deploy our model ensemble considering the predicted quantities that are partly overlapped. To be more specific, we firstly obtain the predicted quantities from PointerNet as our final result, besides, if a predicted quantity from CRF layer strictly does not exist in the PointerNet result, i.e., there is no overlap between the two quantities, we add it to our final result as well.

Relation-specific Object Tagger
The upper level tagging module identifies the MeasuredEntity, MeasuredProperty and Qualifier as well as the involved relations with respect to the quantities obtained in the previous section. It consists of a set of relationshipspecific taggers with the same structure as the quantity tagger in the lower-level for all possible relations. All object taggers identify the corresponding object for each detected quantity simultaneously.
Distinguish from the quantity tagger using representation vector h i as input, the relation-specific object tagger also takes the quantity semantic features into account. Given each contextual representation of the current token h i . The detailed tagging operations are as follows: where p start i and p end i denotes the probability of predicting the start and end position of current token. v k quantity represents the representation of k-th identified quantity via model encoder in Section 2.3.
We iterate all the possible relations across the given quantities. Accordingly, given the token representation h i and quantity k, the loss function of relation-specific object tagger is as follows: where r ∈ R denotes the r-th relation. The rest of symbol keeps the same as Equation 3. Note that tags y start,end i = 0 if the objects are empty.

Unit and modifier extractor
Unit extraction. We design a set of rules to identify the units which are corresponding to each predicted quantity. The detailed description of schema is presented in Algorithm 1.

Algorithm 1 Unit Method
Require: Quantity Q, which is a string of n characters. Modifier Classification. As none of the relations attached to the modifier in the input paragraph, we can not apply the relation-specific object tagger in terms of modifier extraction. Given a candidate quantity token x i , we select its n-gram contextual tokens {x i−n , x i−n+1 , ..., x i , x i+1 , x i+n } and concatenate them as model input and then simply introduce a plain classifier to predict its labels: where c i denotes the representation of quantity and contextual tokens after BERT encoder and y i is the predicted label of modifier in terms of current token x i . The training loss is the conventional cross entropy loss, we will not elaborate on it due to the space limit.

Dataset
This SemEval evaluation has released the dataset online 1 , which includes a text file for each paragraph of scientific text along with annotations. As shown in Table 1, the overlapping entities are 9.3%, 0% and 90.7% in total of NEO, EPO and SEP in terms of train/dev/test set, respectively, which indicates the merit of applying CONNER to our tasks since it can naturally handle the overlapping entities.

Experimental Settings
We adopt mini-batch mechanism to train our model with batch size as 8; the pretrained language model finetuing learning rate is set to 2e-5, crf decoder learning rate is set to 5e-3; the hyper-parameters are determined on the validation set. We also complete words with wrong boundaries by design rules, e.g., "emain mostly neutral" in the raw text is corrected to "remain mostly neutral". The maximum length of sentence is set as 350. The number of n-gram is 0. We adopt Adam (Kingma and Ba, 2014) for optimization.

Main Experiments
The experimental results are conducted in test set, of which each entity category and relations are listed in Table 2. The result of extracting quantity outperforms the rest of entity categories by a large margin regarding named entity recognition. While the HasQuantity naturally achieves the best result in relation extraction task.

Axuiliary Experiments
During the process of building our proposed system, we tested different schemes for each module of the our model and did relative experiments to compare their experiment results, the scheme with best performance is selected as our final modules consisting of CONNER. We present the in-depth analysis and experimental results listed below.
Model encoder. In Subsection 2.2, we separately adopt BERT-based and ROBERTA-based our model encoder. To examine the performance regarding different model encoder, we conduct experiments in the quantity identification stage for both identification of entities and relations. As we can notice in Table 3, ROBERTA-base all outperforms BERT-base so that it is selected as our final model encoder.  Settings of ensemble scheme. We tested the result of utilizing CRF layer and PointerNet layer independently, it shows comparable results as listed in Table 4. As we mentioned in in Subsection 2.3, combining the results of CRF and PointerNet can make the best use of both models, and results verified our assumption that ensemble models all outperform the singular models. we also carried out two different ensemble approach for quantity tagger. The first one is as il-lustrate in Subsection 2.3. The second approach is simpler: we take the union of the predicted quantities of CRF layer and PointerNet layer, and remove duplicate as our final prediction result. The experimental results in Table 4 suggested the first ensemble model achieve the best result, so that it is selected as our final ensemble scheme.  Settings of n-gram. Different number of n-gram can affect the model performance to some extent, we thus tested introducing different length of context regarding extraction of the modifier. As shown in Table 5, the model achieves best performance with 45.13% F1 score when n is 0, meanwhile, we speculate the underlying reason is that model is not capable of capturing valid semantics from contextual tokens due to the limited amount of the modifiers in the whole dataset.

Conclusion
We proposed CONNER, a cascade count and measurement extraction tool to jointly identify the quantities and their attached items, as well as the corresponding relations for SemEval 2021 Task 8: Mea-sEval. Our model extracts these entities and relations in a two-step pipeline method. We also exploited various of technical schemes during the competition and select the one that gains the best performance in the experiments, which help us win second-place in the final ranking.