Fine-grained Typing of Emerging Entities in Microblogs

Analyzing microblogs where we post what we experience enables us to perform various applications such as social-trend analysis and entity recommendation. To track emerging trends in a variety of areas, we want to categorize information on emerging entities ( e.g., Avatar 2) in microblog posts according to their types ( e.g., Film). We thus introduce a new entity typing task that assigns a ﬁne-grained type to each emerging entity when a burst of posts containing that entity is ﬁrst observed in a microblog. The challenge is to perform typing from noisy microblog posts without relying on prior knowledge of the target entity. To tackle this task, we build large-scale Twitter datasets for English and Japanese using time-sensitive distant supervision. We then propose a modular neural typing model that encodes not only the entity and its contexts but also meta information in multiple posts. To type ‘homo-graphic’ emerging entities ( e.g., ‘Go’ means an emerging programming language and a classic board game), which contexts are noisy, we devise a context selector that ﬁnds related contexts of the target entity. Experiments on the Twitter datasets conﬁrm the effectiveness of our typing model and the context selector.


Introduction
Microblogs enable us to instantly share a wider variety of topics than news streams (Graus et al., 2018) and have become one of the primary sources for acquiring new information. To analyze this huge volume of posts for applications such as social-trend analysis and entity recommendation, it is necessary to extract entity units from them and classify their types using techniques such as named entity recognition (NER) and entity linking (Weikum et al., 2020). However, newly 'emerging' entities (e.g., Avatar 2) are difficult to handle because they do not exist in the training data of supervised models or the knowledge bases (KBs), and valuable information of the entities is often thrown away. Motivated by this background, Akasaki et al. (2019) ( § 2.1) defined emerging entities as that appear in contexts that emphasize their novelty, and attempted to discover emerging entities from microblogs. To extract emerging entities, they exploited the fact that entities appear in characteristic contexts when they first emerge (e.g., new games often appear with "trailer," "release" and a console name (Figure 1)) ( § 3.1), and developed a method of discovering them from microblogs. Although their method detected emerging entities promptly, typing those emerging entities is still necessary for usage in the downstream applications.
Existing studies on entity typing, however, focus on non-emerging (or prevalent) entities (Ling and Weld, 2012;Shimaoka et al., 2017;Xin et al., 2018;Obeidat et al., 2019;Ali et al., 2020) ( § 2.2). Most of them classify single mentions of entities into their context-dependent types. To complement a scarce context, many studies rely on language resources such as KBs to narrow down the candidate types. Unfortunately, those resources are not available for newly appearing entities. It is unrealistic to perform accurate mention-level typing using these methods in a short and noisy microblog post.
We thus design a task of identifying a finegrained entity type from a burst of posts about the target entity (Figure 1, § 3.2), assuming that the target mention is detected in advance. This is a more realistic setting for typing emerging entities than the conventional mention-level typing.
To build training data for this task ( § 3.3), we collect emerging entities and their contexts for English and Japanese using distant supervision (Mintz et al., 2009;Akasaki et al., 2019). To evaluate typing methods, we manually build test data for two types of emerging entities: homographic and nonhomographic; homographic entities share names with other words (e.g., 'Go' for a board game, a programming language, and a verb) and consequently their contexts are contaminated.
We then propose a modular entity typing model that performs multi-instance (MI) learning (Riedel et al., 2010;Yaghoobzadeh et al., 2018) ( § 4.1). In addition to contexts for the entity and its entity surface, this model leverages meta-information such as URLs and usernames, exploiting the characteristics of the microblog domain. Because entities can have homographs, it is risky to use all the posts obtained using simple string matching as contexts for typing. We thus propose to find and use emerging contexts since two emerging entities with the same name are unlikely to emerge in a short period of time and such contexts are useful for typing ( § 4.2).
We finally evaluate our typing model on the above English and Japanese Twitter datasets ( § 5). Experimental results confirm that our model outperforms a baseline model that performs MI-learning with randomly selected posts in training and testing. We demonstrate that when typing homographic emerging entities, it is more important to selectively use emerging contexts and meta information.
Our contributions are as follows: • We set up a task of fine-grained typing of emerging entities in microblogs ( § 3.2).
• We built two large-scale Twitter datasets for English and Japanese ( § 3.3). We will release them to facilitate future studies.
• We proposed an entity typing model ( § 4.1) and a context selection model ( § 4.2) that outperformed a baseline with MI-learning ( § 5).

Related Work
In this section, we first review existing studies on the definition and detection of the emerging entities. We then explain the existing task settings of entity typing and discuss their limitations.

Emerging Entity Detection
Although there are studies that find "emerging" entities (Nakashole et al., 2013;Hoffart et al., 2014;Wu et al., 2016;Derczynski et al., 2017), most of them in fact consider out-of-KB entities, which include not only emerging entities that are not prevalent (newly appeared and yet not widely known) in the world but also prevalent entities that are absent from the incomplete KBs such as Wikipedia.
Although we do not handle prevalent out-of-KB entities in this study, we intend to type those entities before they become prevalent in a microblog.
To target only truly emerging entities, Akasaki et al. (2019) defined emerging entities as those which appear in emerging contexts that emphasize their novelty ( § 3.1). With this definition, they developed a method called time-sensitive distant supervision, which uses time-stamps of microblogs to collect early posts (contexts) in which KB entries (entities) appear. Using the datasets collected for Japanese, they trained an emerging entity recognizer, which successfully discovered various emerging entities more than one year before their registrations into Wikipedia.
In this study, we adopt the definition of emerging entities proposed by Akasaki et al. (2019) and conduct time-sensitive distant supervision to automatically construct large-scale English and Japanese Twitter datasets for typing emerging entities.

Entity Typing
Traditionally, named entity recognition (Sang and De Meulder, 2003;Ritter et al., 2011;Weischedel et al., 2013;Ma and Hovy, 2016;Akbik et al., 2019) jointly performs recognition and typing of entity mentions in the text. However, most of the NER models require costly training data that fully annotate all entities in the text. Indeed, many studies adopt less than ten coarse types (e.g., person, location, and organization) (Mai et al., 2018).
Focusing on fine-grained entity typing, recent studies adopted distant supervision (Mintz et al., 2009) that automatically annotates entities with KB categories, and tackled the task of classifying single mentions of entities with their types in a context (Ling and Weld, 2012). This allows us to exploit resource-hungry neural models (Shimaoka et al., 2017) and knowledge of the target entity derived from KBs (Obeidat et al., 2019;Xin et al., 2018) or a large corpus (Del Corro et al., 2015). Although these methods succeeded in mitigating context scarcity and typing entities accurately, they are not effective when typing emerging entities that are absent from the KBs and the corpus.
To enumerate all possible types for out-of-KB entities, Lin et al. (2012) and Nakashole et al. (2013) performed entity-level entity typing (as multi-label classification). They extracted local contexts (patterns) from multiple sentences (contexts) in which entities appeared, and propagated types from in-KB entities that exhibit similar patterns. However, this approach needs massive contexts to obtain reliable patterns. Yaghoobzadeh et al. (2018) and Xu et al. (2018) elaborate on these methods by using embeddings of entities instead of patterns and by encoding actual contexts with a neural network. However, this approach cannot be directly used to type emerging entities since it is difficult to collect contexts for emerging entities; the entity linking they used to collect contexts requires KBs that are not available for emerging entities.
In this study, in order to type emerging entities in a microblog as early as possible, we set up a task of entity-level fine-grained typing of emerging entities from a burst of posts ( § 3.2). We build Twitter datasets for this task ( § 3.3) and develop an effective typing method ( § 4).

Task and Datasets
This section first introduces the definition of emerging entities (Akasaki et al., 2019) and then defines our task of typing emerging entities. Finally, we describe our dataset for this task.

Definition of Emerging Entity
We adopt the same definition of emerging entity as in Akasaki et al. (2019) to focus on truly emerging entities. They defined emerging entities as follows, inspired by the fact that microblog users mention emerging entities that are not yet well known in characteristic contexts (emerging contexts): Emerging contexts. Contexts in which the writers assumed the readers do not know the existence of the entities.
Emerging entities. Entities in the state of being still observed in emerging contexts.
They built a Japanese dataset of emerging entities with emerging contexts by collecting early time-stamped posts of Wikipedia entities from Twitter by using time-sensitive distant supervision. Since their dataset does not include type information, we reconstruct it with types in English and Japanese from scratch.

Task Settings
Inspired by the related studies on entity typing ( § 2.2) and the definition of emerging entities, we design the task of emerging entity typing. We take the following points into consideration: 1) For applications such as social trend analysis, we want to type emerging entities as soon as they appear. 2) Since microblog posts are short and noisy, we practically need more than one post for typing. In fact, the accuracy of Twitter NER is very low (29.7%) for out-of-vocabulary entities (Fukuda et al., 2020). 3) Emerging entities show an early burst of posts around the time of their introduction into public discourse (Graus et al., 2018). These considerations lead us to the following task settings: Fine-grained emerging entity typing. Given an entity and a burst of posts containing the entity, the goal of the task is to predict the single type of the entity as multi-class classification.
We assume a single type for emerging entities since two entities with the same name are unlikely to simultaneously emerge in a short period of time. As for the burst, to simplify the task, we split posts by a day defined by the UTC-0 time zone and considered a burst to have occurred if an entity string appeared more than 10 times in any of the bins for the first time.
There are two challenges in this task: 1) How to perform accurate typing in situations where we cannot assume the existence of emerging entities in language resources such as KBs and massive contexts. 2) How to deal with homographic emerging entities where a simple string match would cause contamination of contexts for the target entity.

Dataset Construction
We construct training, development and test data for our task, following the above definition and the task settings. We adopt Twitter as a microblog and target English and Japanese, which are the top two languages on Twitter (Alshaabi et al., 2021). We use our archive of Twitter posts that are retrieved 1 by using the official Twitter APIs 2 and consists of more than 50B posts (32% are English and 20% are Japanese; This does not deviate much from the actual data (Alshaabi et al., 2021)). In the following, we explain how we automatically create training and development data and how to manually build the test data for non-homographic and homographic emerging entities.

Training Data
To create the training data, we used time-sensitive distant supervision (Akasaki et al., 2019) to collect the contexts of entities in Wikipedia at the time they emerge. For both English and Japanese, we gathered the titles of articles as candidates of emerging entities that were registered in Wikipedia from Mar. 11th, 2012 to Dec. 31st, 2015. To remove entities that may not be emerging, we discarded the titles that were not reposted more than 10 times or more. Since the entity string (e.g., 'Go') may refer to multiple entities (a programming language and a board game) and existing words (verb), we discarded the titles that appeared 10 times in the period of Mar. 11th, 2011 to Mar. 10th, 2012 to avoid contamination with non-emerging contexts. 3 Next, we retrieved all posts for the period from Mar. 11th, 2012 to Dec. 31th, 2019 where each of the collected entities appeared in our Twitter archive. Using these data, we collected 50 posts up to the date of the first burst of each entity as emerging contexts. We collected another 50 posts for each entity one year after the time of the initial collection as prevalent contexts. We used these contexts as negative examples of a context selection model and for pretraining the typing model.
We mapped the collected entities to their corresponding fine-grained types assigned in the DBpedia (Auer et al., 2007) ontology; for example, the entity "Spider-Man: Homecoming" is mapped to the type "Film." For analysis purposes, we manually classified the mapped types into coarse-grained types for each language derived from Akasaki et al. (2019). As a result, we obtained 597,569 emerging contexts and 859,034 prevalent contexts from 37,374,820 posts for 20,571 entities with 6 coarsegrained and 185 fine-grained types for English. For Japanese, we obtained 259,484 emerging contexts and 440,751 prevalent contexts from 47,869,813 posts for 10,315 entities with 4 coarse-grained and 71 fine-grained types. The difference in the num-  ber of types comes from the degree of DBpedia development for each language. Table 1 shows the statistics of obtained emerging entities and contexts. We see that the frequency of fine-grained types varies by language; for example, the English PERSON type includes many athletes entities, while the Japanese PERSON type does not. This reflects the fact that the coverage of entities in Wikipedia varies across languages.

Test Data
For non-homographic emerging entities, we built the test data in a similar way as the training data, and then manually cleaned the data for reliable evaluation. Specifically, we collected the titles of Wikipedia articles as entities that appeared more than 100 times on our Twitter archive from Jan. 1st, 2017 to June 20th, 2018 for English and from Jan. 1st, 2016 to June 20th, 2018 for Japanese. We then collected posts up to the date of the first burst for each entity. Since those entities may not actually be emerging, we removed entities whose posts are judged to include only prevalent contexts by two of three annotators (the first author and two graduate students). We obtained an inter-  For homographic emerging entities, we manually constructed the test data since it is difficult to collect their contexts using distant supervision. We collected the titles of Wikipedia articles, each of which has a disambiguation page, and gather the newest one with their posts from the same period. Since those entities share contexts with other entities of the same name, we asked the three annotators to identify the exact day when the target entity first appears with emerging contexts for the given type. We adopt entities with the answers (days) agreed upon by two or more annotators. We obtained an inter-rater agreement of 0.684 for English and 0.665 for Japanese by Fleiss' Kappa; both show substantial agreement. We collected the posts of that day and the previous day and finally got a total of 5,931 posts for 200 emerging entities in English and 13,430 posts for 200 entities in Japanese (see Appendix (Table 6) for the statistics). Table 2 shows some examples of collected entities and their posts (excerpts). The first example is a non-homographic emerging entity in the training Bi-GRU … Self attention Self attention Self attention Figure 2: Overview of our entity typing model (N = 2): three networks process contexts, entity, and metainformation, respectively using MI-learning. data. The second example is a non-homographic emerging entity in the test data. From this example, we see that there is a useless context for guessing the type (e.g., No. 3). The third example is a homographic emerging entity in the test data, and as we can see, it contains a noisy context (e.g., No. 3) that is not related to the target entity. We thus have to properly select only the related contexts of the target entity to predict its type.

Proposed Method
This section presents a method for typing emerging entities in microblogs. Microblogs have the following characteristics: most posts are short and noisy, several posts about the same topic appear in close time series, and it has meta-information such as usernames and URLs that are useful for inferring the type. We thus develop a neural typing model based on diverse features and MI-learning ( § 4.1).
Considering the existence of homographic entities (e.g., Go), one may want to select only the posts that are relevant to the target entity, rather than using all posts when performing MI-learning. We thus develop a context selection model that ranks emerging contexts of the target entity ( § 4.2). In the following, we describe the details of each model and how to train and test the models.

Entity Typing Model
To capture the characteristics of emerging entities from diverse perspectives, we develop a modular model that consists of three neural networks ( Figure 2): Context Network and Entity Network that encode contexts and entities, which are based on Yaghoobzadeh et al. (2018) while refining their classic CNN-based structure with GRU (Bahdanau et al., 2015) and self-attention mechanism (Lin et al., 2017), and Meta Network that encodes meta-information specific to microblogs. We rely on MI-learning (Riedel et al., 2010), which assigns a single label to a bag of multiple instances to increase the number of clues and to mitigate the effects of noise induced by distant supervision. The final prediction is made by feeding the output of each network into the softmax layer through a feed-forward network We describe the details of each network hereafter.

Context Network
This model captures contexts of given posts; it differs from the Context Model (Yaghoobzadeh et al., 2018) in that we change CNN to GRU and introduce a self-attention mechanism to capture longer relationships and dependencies between words (Yin et al., 2017). Specifically, we encode the given entity using MI-learning by inputting N contexts where the entity appears. We convert each word w it , t ∈ [1, S] of the i-th input context to x it using the embedding matrix W w , x it = W w w it . We input this into a bi-directional GRU as h it = BIGRU(x it ), and apply self-attention to the entire hidden states to capture the word relations: (1) We first obtain the similarity u ijk between h ij and h ik . We use additive attention that consists of a feed-forward network to calculate those alignment scores. We then compute the importance weight α ijk using the softmax function. After that, we obtainh ij as a weighted sum of the hidden layers. Theseh ij are concatenated to form the sentence representation s i = [h i1 ; ...;h iS ].
Once we have N sentence representations, we apply self-attention to them again to get the relations between sentences: Theseś i are concatenated and used as output o context = [ś i ; ...;ś N ].

Entity Network
This model captures a given entity surface; it differs from the Global Model (Yaghoobzadeh et al., 2018), in that we change CNN to GRU and remove the KB embeddings of the target entity because they are not available for emerging entities. This model predicts the type of the target entity from its sequence of characters and words. We convert each character c i , t ∈ [1, C] of the target entity to x i using the embedding matrix W c , x i = W c c i . Similarly to the Context Network, we input this into a bi-directional GRU and obtain the character-based entity representation as h = BIGRU(x i ).
Tokens inside the entity name are also useful clues. We obtain a token representation v by simply taking the average of the pre-trained word embeddings t j divided by the number of tokens T in the entity as v = j t j T . These representations are concatenated and used as output o entity = [h; v].

Meta Network
In addition to the contexts and the entity name, meta-information such as URLs and user (author) information are useful for typing emerging entities in microblogs. For example, URLs (e.g., https:// blog.playstation.com/ 2020/ 12/ 10/ returnal-launches-on-ps5-march-19-2021/ ) often include clues of the entity type, and users like official accounts often post about a specific type of an entity (e.g., @NintendoAmerica often announces about their new game products). Moreover, we can extract, from KBs, useful knowledge on in-KB entities that co-occur with the target entity.
We thus extract the above meta information from the input N posts and convert them into a feature vector. For user information, we simply extract the author's user IDs. As for URLs, we extract all URLs from the input. For each URL, we discard the URL parameters after the '?' or '&', and then separate the remaining strings with delimiters ('-','/','_','+'). The resulting data are converted into a one-hot vector z and it is fed into a one-hidden layer feed-forward network as f = W z z + b z .
Entities that co-occur with the target entity also provide clues that can help to infer the type. For example, an entity of the Actor type is likely to co-occur with existing entities of related types such as Film and Award. To obtain entity information, we list entity embeddings e i , i ∈ [1, E] from the input N posts using the method of Yamada and Shindo (2019). To obtain the relationship between these entities, we employ self-attention as follows: These representations are concatenated with f and used as the output o meta = [é i ; ...;é E ; f ]

Context Selection Model
At test time, we input an entity with a burst of posts, which are retrieved by a native string matching. However, those posts can include contexts of homographic entities (e.g., No. 3 for Another Life in Table 2) and noisy posts that have no clue on the entity type (e.g., No. 3 for Ben Sheaf in Table 2).
To address these issues, we take advantage of emerging contexts of the target entity; if we collect only emerging contexts, 1) we can utilize appropriate contexts for the target entity since two emerging entities with the same name are unlikely to emerge in a short period of time, and 2) emerging contexts by definition include enough information for the readers to understand the target entity.
We thus develop a context selector that predicts whether a given context is an emerging context or not. Specifically, we train a bi-directional GRU, which performs binary classification with the emerging and prevalent contexts collected in § 3.3. Using this model, we input each context from the test data and assign a prediction score for the emerging context. For each entity, the top-N contexts of these scores are used as input to the typing model (Figure 3).

Model Training
Issues in developing typing and context selection models are how to utilize the constructed training data and how to select the input for the typing model during training. In this study, we simply train each model independently using the same data. Specifically, for the context selection model, we feed the model with the emerging and prevalent contexts of the constructed training data. For the typing model, since we use N emerging contexts (posts) during the test time, we repeatedly pick N emerging contexts (posts) in chronological order from the training data and input each N posts into the model to fully exploit a burst of posts of an entity (Figure 3). Here, we perform pretraining with the prevalent contexts and then fine-tune the typing model to improve its robustness. In the experiments, we compare our model with a model that randomly selects contexts for both training and test time.

Experiments
We performed emerging entity typing using the English and Japanese Twitter datasets built in § 3.3.

Models
We describe the typing models compared in the experiments. Since all models employ MI-learning, we use the same parameter N for the models to control the number of input posts. Proposed (fine-tune) trains the proposed typing model with prevalent contexts, and then performs fine-tuning with emerging contexts. At test time, we applied the context selection model to all the contexts of each entity in the test data to form input. Proposed (random) randomly extracts 100 contexts per entity from all the collected posts in § 3.3 and trains the proposed model. At test time, we randomly selected the contexts for each entity in the test data. This is meant to confirm the effect of discriminating types of contexts (domains). Yaghoobzadeh uses the model of Yaghoobzadeh et al. (2018) modified for our task settings. This model predicts the type of the given entity from its name and contexts using a CNN. Compared to ours, it randomly selects contexts and does not use meta-information. We randomly extracted 100 contexts per entity from the collected contexts in § 3.3 and trained the model. At test time, we randomly selected the contexts for each entity in the test data.

Settings
We tokenized each input post using spaCy (ver. 2.0.12) 4 with en_core_web_sm model for English and using MeCab (ver. 0.996) 5 with ipadic (ver. 2.7.0) for Japanese. We implemented all the models using Keras (ver. 2.3.1). 6 To initialize the word embedding layers for English, we used the 200-dimensional word embeddings pre-trained using GloVe (Pennington et al., 2014) from 2B English posts. 7 For Japanese, we trained 200-dimensional word embeddings using GloVe from 800M Japanese posts posted from Mar. 11th, 2011to Mar. 11th, 2012 in our Twitter archive. For the Meta Network, from URLs and usernames, we extracted the top 20,000 most frequent tokens in the training data and used as z ( § 4.1.3). We used wikipedia2vec 8 with the Wikipedia dump on Dec. 26th, 2015 to extract 100dimensional embeddings of the entities that cooccur with the target entity.
We optimized all the models using Adam (Kingma and Ba, 2015). We finally chose the model at the epoch with the highest accuracy on the development data. We show the detailed hyperparameters of the models in Appendix (Table 7).
For the model of Yaghoobzadeh, we adopt the same configurations and hyperparameters of their study.
For each entity in the test data, we perform entity typing once using the selected contexts for each model. For each N , we trained and tested each model 10 times, calculated the micro-F1 (Ling and Weld, 2012), and averaged the results.   Table 3 shows the results of all types and for each coarse-grained type when N = 10. For most of the types, Proposed (fine-tune) outperformed the other methods for both English and Japanese. This indicates the validity of our typing model and the importance of discriminating emerging contexts and others (vs. Proposed (random)). Especially for homographic entities, since those entities contain many noisy contexts of other entities, our context selection method that identifies the emerging contexts worked effectively. ber of input posts, N . Although the performances of all the models improve as N is increased, its gain almost converges at N = 8. The improvement from N = 1 shows the effectiveness of using multiple posts in this task.

Results and Analysis
Cross-language analysis Interestingly, the performance of Japanese homographic entities is lower than English, even though the number of target types is smaller than that of English (185 vs. 71). This is probably because in languages such as Japanese and Chinese, where entities are not capitalized, their contexts are more likely to be contaminated by common nouns; for example, '香水 (kosui)' refers to both the common noun 'perfume' and the name of the Japanese song released in 2020. In fact in Japanese, the performance of the models without context selection significantly dropped.
Ablation study To verify the contribution of each network of the proposed model, we performed an ablation test. Figure 6 shows the performance change of Proposed (fine-tune) for the English data. We can see that there are significant performance drops when the Context Network is removed. The Entity Network is effective for homographic entities but not for non-homographic entities. Since homographic entities may contain entities with the same name in the training data, it is natural that the Entity Network trained on such data would make biased predictions for such entities. For the Meta Network, it is effective for non-homographic entities with limited contexts (N < 4) and homographic entities. Such meta-information helps the model make robust predictions even when the contexts are scarce or contaminated by homographic entities.
Examples Table 4 lists examples of predictions with proposed (fine-tune). In the first example, although it is difficult to determine its type using only  Table 4: Examples that our model predicted correctly (above) and incorrectly (below) (English, N = 2) the first context (N = 1), by adding another context (N = 2), the proposed model utilized it (about a baseball draft) and determined the correct type. The second example is an entity that the proposed model predicted incorrectly. Although we can infer that "Sonos One" is an appliance since it appears with entities like "Google Home" and "Amazon Echo," the proposed method failed to predict the correct type due to the absence of those entities in the period before 2016 when the training data were collected. We thus need to update the training data periodically to cover the latest entities (concepts) by using a method like distant supervision.

Conclusions
We introduced a task of typing emerging entities in microblogs ( § 3.2). To perform this task, on the basis of the definition of emerging entities ( § 3.1), we constructed large-scale Twitter datasets for English and Japanese ( § 3.3). We developed a modular entity typing model ( § 4.1) that encodes different aspects of an emerging entity with MI-learning. To deal with noisy contexts of homographic entities, we adopt a context selection model ( § 4.2) that differentiates emerging contexts from others. Experiments ( § 5) demonstrated that our method performed more accurately than the baseline model for both non-homographic and homographic emerging entities. We confirmed the importance of selectively using emerging contexts for training and testing the typing model and verified the effectiveness of each network of the proposed typing model. For future work, we plan to perform further profiling of emerging entities such as relation extraction to organize emerging and existing knowledge. We release the dataset used in our experiments. 9

A.1 Annotation Guideline
We provided the following instructions and some examples (e.g., Table 2) for annotators. These instructions are translated from Japanese to promote readability.
1. Please read the following definition of emerging entities (omitted here since it is identical to the one given in § 3.1) carefully and see examples of emerging entities.
2. For non-homographic entities, you will be given entities and their tweets. Please check these tweets and then label the entity as "emerging" if one or more emerging contexts for the entity appear.
3. For homographic entities, you will be given tweets with entities on each day. Please check these tweets in date order and "fill in the dates" when one or more emerging contexts for the entity appear. Table 5 and 6 show the statistics of the test data. As for homographic entities, the data is unbalanced because they tended to be concentrated in certain types, such as names of people and creative works. Table 7 shows the hyperparameters of our typing model and context selection model.