Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention

Natural language processing techniques have demonstrated promising results in keyphrase generation. However, one of the major challenges in neural keyphrase generation is processing long documents using deep neural networks. Generally, documents are truncated before given as inputs to neural networks. Consequently, the models may miss essential points conveyed in the target document. To overcome this limitation, we propose SEG-Net, a neural keyphrase generation model that is composed of two major components, (1) a selector that selects the salient sentences in a document and (2) an extractor-generator that jointly extracts and generates keyphrases from the selected sentences. SEG-Net uses Transformer, a self-attentive architecture, as the basic building block with a novel layer-wise coverage attention to summarize most of the points discussed in the document. The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin.


Introduction
Keyphrases are short pieces of text that summarize the key points discussed in a document. They are useful for many natural language processing and information retrieval tasks (Wilson et al., 2005;Berend, 2011;Tang et al., 2017;Subramanian et al., 2018;Zhang et al., 2017b;Wan and Xiao, 2008;Jones and Staveley, 1999;Kim et al., 2013;Hulth and Megyesi, 2006;Hammouda et al., 2005;Wu and Bolivar, 2008;Dave and Varma, 2010). In the automatic keyphrase generation task, the input is a document, and the output is a set of keyphrases that can be categorized as present or absent keyphrases. Present keyphrases appear exactly in the target doc- * Work done during internship at Yahoo Research.
Title: [1] natural language processing technologies for developing a language learning environment . Abstract: [1] so far , computer assisted language learning ( call ) comes in many different flavors .
[1] our research work focuses on developing an integrated e learning environment that allows improving language skills in specific contexts . [1] integrated e learning environment means that it is a web based solution . . . , for instance , web browsers or email clients . ument, while absent keyphrases are only semantically related and have partial or no overlap to the target document. We provide an example of a target document and its keyphrases in Figure 1.
In recent years, the neural sequence-to-sequence (Seq2Seq) framework (Sutskever et al., 2014) has become the fundamental building block in keyphrase generation models. Most of the existing approaches (Meng et al., 2017;Chen et al., 2018;Yuan et al., 2020;Chen et al., 2019b) adopt the Seq2Seq framework with attention (Luong et al., 2015;Bahdanau et al., 2014) and copy mechanism (See et al., 2017;Gu et al., 2016). However, present phrases indicate the indispensable segments of a target document. Emphasizing on those segments improves document understanding that can lead a model to coherent absent phrase generation. This motivates to jointly model keyphrase extraction and generation (Chen et al., 2019a).
To generate a comprehensive set of keyphrases, reading the complete target document is necessary. However, to the best of our knowledge, none of the previous neural methods read the full content of a document as it can be thousands of words long. Existing models truncate the target document; take the first few hundred words as input and ignore the rest of the document that may contain salient information. On the contrary, a significant fraction of a long document may not associate with the keyphrases. Presumably, selecting the salient segments from the target document and then predicting the keyphrases from them would be effective.
To address the aforementioned challenges, in this paper, we propose SEG-Net (stands for Select, Extract, and Generate) that has two major components, (1) a sentence-selector that selects the salient sentences in a document, and (2) an extractorgenerator that predicts the present keyphrases and generates the absent keyphrases jointly. The motivation to design the sentence-selector is to decompose a long target document into a list of sentences, and identify the salient ones for keyphrase generation. We consider a sentence as salient if it contains present keyphrases or overlaps with absent keyphrases. As shown in Figure 1, we split the document into a list of sentences and classify them with salient and non-salient labels. A similar notion is adopted in prior works on text summarization (Chen and Bansal, 2018;Lebanoff et al., 2019) and question answering (Min et al., 2018). We employ Transformer (Vaswani et al., 2017) as the backbone of the extractor-generator in SEG-Net.
We equip the extractor-generator with a novel layer-wise coverage attention such that the generated keyphrases summarize the entire target document. The layer-wise coverage attention keeps track of the target document segments that are covered by previously generated phrases to guide the self-attention mechanism in Transformer while attending the encoded target document in future generation steps. We evaluate SEG-Net on five benchmarks from scientific articles and two benchmarks from web documents to demonstrate its effectiveness over the state-of-the-art neural generative methods. We perform ablation and analysis to show that selecting salient sentences improve present keyphrase extraction and the layer-wise coverage attention and facilitates absent keyphrase generation. Our novel contributions are as follows.
1. SEG-Net that identifies the salient sentences in the target document first and then use them to generate a set of keyphrases. 2. A layer-wise coverage attention.

Problem Definition
Keyphrase generation task is defined as given a text document x, generate a set of keyphrases is a sequence of words. A text document can be split into a list of sentences, is a consecutive subsequence of the document x with begin index j ≤ |x| and end index (j + |s i |) < |x|. In literature, keyphrases are categorized into two types, present and absent. A present keyphrase is a consecutive subsequence of the document, while an absent keyphrase is not. However, an absent keyphrase may have a partial overlapping with the document's word sequence. We denote the sets of present and absent keyphrases as K p = {k 1 p , k 2 p , . . . , k |K p | p } and K a = {k 1 a , k 2 a , . . . , k |K a | a }, respectively. Hence, we can express a set of keyphrases as K = K p ∪ K a .
SEG-Net decomposes the keyphrase generation task into three sub-tasks. We define them below.
Task 1 (Salient Sentence Selection). Given a list of sentences S x , predict a binary label (0/1) for each sentence s i x . The label 1 indicates that the sentence contains a present keyphrase or overlaps with an absent keyphrase. The output of the selector is a list of salient sentences S sal x .
Task 2 (Present Keyphrase Extraction ). Given S sal x as a concatenated sequence of words, predict a label (B/I/O) for each word that indicates if it is a constituent of a present keyphrase.
Task 3 (Absent Keyphrase Generation). Given S sal x as a concatenated sequence of words, generate a concatenated sequence of keyphrases in a sequence-to-sequence fashion.

SEG-Net for Keyphrase Generation
Our proposed model, SEG-Net jointly learns to extract and generate present and absent keyphrases from the salient sentences in a target document.
The key advantage of SEG-Net is the maximal utilization of the information from the input text in order to generate a set of keyphrases that summarize all the key points in the target document. SEG-Net consists of a sentence-selector and an extractor-generator. The sentence-selector identifies the salient sentences from the target document (Task 1) that are fed to the extractor-generator to predict both the present and absent keyphrases (Task 2, 3). We detail them in this section.

Embedding Layer
The embedding layer maps each word in an input sequence to a low-dimensional vector space. We train three embedding matrices, W e , W pos , and W seg that convert a word, its absolute position, and segment index into vector representations of size d model . The segment index of a word indicates the index of the sentence that it belongs to. In addition, we obtain a character-level embedding for each word using Convolutional Neural Networks (CNN) (Kim, 2014a). To learn a fixed-length vector representation of a word, we add the four embedding vectors element-wise. To form the vector representations of the keyphrase tokens, we only use their word and character-level embeddings.

Sentence-Selector
The objective of the sentence-selector is to predict the salient sentences in a document, as described in Task 1. Given a sentence, s i x = [x j , . . . , x j+|s i |−1 ] from a document x, the selector predicts the salience probability of that input sentence. First, the embedding layer maps each word in the sentence into a d model dimensional vector. The sequence of word vectors are fed to a stack of Transformer encoder layers that produce a sequence of output representations Then we apply max and mean pooling on the output representations to form s max , s mean ∈ R d model that are concatenated s pool = s max ⊕ s mean to form the sentence embedding vector. We feed the vector s pool through a three-layer, batch-normalized (Ioffe and Szegedy, 2015) maxout network (Goodfellow et al., 2013) to predict the salience probability.

Extractor-Generator
The extractor-generator module in SEG-Net takes a list of salient sentences from a document as an input that are concatenated to form a sequence of words Figure 2: Overview of the Extractor-Generator module of SEG-Net. The major components are encoder, extractor, and decoder. The encoder encodes the salient sentences of the input document. The extractor predicts the present keyphrase's constituent words while the decoder generates the absent keyphrases word by word. and predicts the present and absent keyphrases. We illustrate the extractor-generator module in Figure  2 and describe its major components as follows.
Encoder The encoder consists of an embedding layer followed by an L-layer Transformer encoder. Each word in the input sequence [x 1 , . . . , x n ] is first mapped to an embedding vector. Then the sequence of word embeddings is fed to the Transformer encoder that produces contextualized word representations [o l 1 , . . . , o l n ] where l = 1, . . . , L using the multi-head self-attention mechanism.
Extractor In a nutshell, the extractor acts as a 3-way classifier that predicts a tag for each word in the BIO format. The extractor takes [o L 1 , . . . , o L n ] as input and predicts the probability of each word being a constituent of a present keyphrase.
where W r 1 , W r 2 , b r 1 , b r 2 are trainable parameters.
Decoder The decoder generates the absent keyphrases as a concatenated sequence of words [y * 1 , . . . , y * m ] where m is the sum of the length of the phrases. The decoder predicts the absent phrases word by word given previously predicted words in a greedy fashion. The decoder employs an embedding layer, L-layers of Transformer decoder followed by a softmax layer. The embedding layer converts the words into vector representations that are fed to the Transformer decoder. We use relative positional encoding (Shaw et al., 2018) to inject order information of the keyphrase terms. The output of the last (L-th) decoder layer h L 1 , . . . , h L m is passed through a softmax layer to predict a probability distribution over the vocabulary V .
Coverage Attention The coverage attention (Tu et al., 2016;Yuan et al., 2020;Chen et al., 2018) keeps track of the parts in the document that has been covered by previously generated phrases and encourages future generation steps to summarize the other segments of the target document. The underlying idea is to decay the attention weights of the previously attended input tokens while decoder attends the encoded input tokens at time step, t. To equip the multi-layer structure of the Transformer with a layer-wise coverage attention, we adopt the layer-wise encoder-decoder attention technique (He et al., 2018). We compute the attention weights, α ti = e ti n k=1 e tk in encoder-decoder attention at each layer where e ti is as follows.
where e ti is the scaled-dot product between the target token y t and the input token x i .
Copy Attention Absent keyphrases have partial or no overlapping with the target document. With the copy mechanism, we want the decoder to learn to copy phrase terms that overlap with the target document. Hence, we adopt the copying mechanism and use an additional attention layer to learn the copy distribution on top of the decoder stack. Formally, we take the output from the last layer of the encoder [o L 1 , . . . , o L n ] and compute the attention score of the decoder output h L t at time step t as: Then we compute the context vector, c L t at time step t: The copy mechanism uses the attention weights a L ti as the probability distribution P (y * t = x i |u t = 1) = a L ti to copy the input tokens x i . We compute the probability of using the copy mechanism at the decoding step t as p( where || denotes the vector concatenation operator. Then we obtain the final probability distribution for the output token y * t as: P (y * t ) = P (u t = 0)P (y * t |u t = 0) + P (u t = 1)P (y * t |u t = 1) where P (y * t |u t = 0) is defined in Eq. (1). All probabilities are conditioned on y * 1:t−1 , x, but we omit them to keep the notations simple.

Learning Objectives
We individually train the sentence-selector and the extractor-generator in SEG-Net.
Sentence-Selector For each sentence in a document x, the selector predicts the salience label. We choose the sentences containing present keyphrases or overlap with absent keyphrases as the gold salient sentences and use the weighted cross-entropy loss for selector training.
1} is the ground-truth label for the j-th sentence and ω is a hyper-parameter to balance the importance of salient and non-salient sentences.
Extractor-Generator The extractor-generator takes a list of salient sentences as a concatenated sequence of words. For each word of the input sequence, the extractor predicts whether the word appears in a contiguous subsequence that matches a present keyphrase. The extractor treats the task as a binary classification task and we compute the extraction loss L e as in Eq. (3).
The decoder in extractor-generator generates the list of absent keyphrases in a sequence-to-sequence fashion. We compute the negative log-likelihood L g of the ground-truth keyphrases.
where n is sum of the length of all absent phrases. The overall loss to train the extractor-generator is computed as a weighted average of the extraction and generation loss, 4 Experiment Setup

Datasets and Preprocessing
We conduct experiments on five scientific benchmarks from the computer science domain: KP20k , we use the training set of the largest dataset, KP20k, to train and employ the testing datasets from all the benchmarks to evaluate the baselines and our models. KP20k dataset consists of 530,000 and 20,000 articles for training and validation, respectively. We remove all the articles from the training portion of KP20k that overlaps with its validation set, or in any of the five testing sets. After filtering, the KP20k dataset contains 509,818 training examples that we use to train all the baselines and our models.
We perform experiments on two web-domain datasets that consist of news articles and general web documents. The first dataset is KPTimes (Gallina et al., 2019) that provides news text paired with editor-curated keyphrases. The second dataset is an in-house dataset generated from the click logs of a large-scale commercial web search engine. Specifically, we randomly sampled web documents that were clicked at least once during the month of February in 2019. For each sampled web document, we collected 20 queries that led to the highest number of clicks on it. This design choice is motivated by the observation that queries frequently leading to clicks on a web document usually summarize the main concepts in the document. We further filter out the less relevant queries by ranking them based on the number of clicks. The relevance score for each query is assigned by an in-house querydocument relevance model. We also remove duplicate queries by comparing their bag-of-words representation. 1 The dataset consists of 206,000, 24,000, and 26,000 unique web documents for training, validation, and evaluation, respectively.
Statistics of the test portion of the experiment datasets are provided in Table 1 in Appendix. Following Meng et al. (2017), we apply lowercasing, tokenization and replacing digits with digit symbol to preprocess all the datasets. We use spaCy (Honnibal et al., 2020) for tokenization and collecting the sentence boundaries.

Baseline Models and Evaluation Metrics
We compare the performance of SEG-Net with four state-of-the-art neural generative methods, catSeq  erates one keyphrase in a sequence-to-sequence fashion and use beam search to generate multiple keyphrases. In contrast, following Chan et al.
(2019), we concatenate all the keyphrases into one output sequence using a special delimiter sep , and use greedy decoding during inference. We train all the baselines using maximum-likelihood objective. We use the publicly available implementation of these baselines 2 in our experiment.
To measure the accuracy of the sentence-selector, we use averaged F1 score (macro). We also compute precision and recall to compare the performance of the sentence-selector with a baseline. While in SEG-Net, we select up to N predicted salient sentences, in the baseline method, the first N sentences are selected from the target document so that their total length does not exceed a predefined word limit (200 words). In keyphrase generation, the accuracy is typically computed by comparing the top k predicted keyphrases with the groundtruth keyphrases. We follow Chan et al. (2019) to perform evaluation and report F1@M and F1@5 for all the baselines and our models.

Implementation Details
Hyper-parameters We use a fixed vocabulary of the most frequent |V | = 50, 000 words in both sentence-selector and extractor-generator. We set d model = 512 for all the embedding vectors. We set L = 6, h = 8, d k = 64, d v = 64, d f f = 2, 048 in Transformer across all our models. We detail the 2 https://github.com/kenchan0226/keyphrase-generationrl hyper-parameters in Table 11 in Appendix.
Training We perform grid search for β over [0.4, 0.5, 0.6] on the dev set and found β = 0.5 results in the best performance. Loss weights for positive samples ω are set to 0.7 and 2.0 during selector and extractor training. 3 We train all our models using Adam (Kingma and Ba, 2015) with a batch size of 80 and a learning rate of 10 −4 . During training, we use dropout and gradient clipping. We halve the learning rate when the validation performance drops and stop training if it does not improve for five successive iterations. We train the sentenceselector and extractor-generator modules for a maximum of 15 and 25 epochs, respectively. Training the modules takes roughly 10 and 25 hours on two GeForce GTX 1080 GPUs, respectively.
Decoding The absent keyphrases are generated as a concatenated sequence of words. Hence, unlike prior works (Meng et al., 2017;Chen et al., 2018Chen et al., , 2019bZhao and Zhang, 2019), we use greedy search as the decoding algorithm during testing, and we force the decoder never to output the same trigram more than once to avoid repetitions in the generated keyphrases. This is accomplished by not selecting the word that would create a trigram already exists in the previously decoded sequence. It is a well-known technique utilized in text summarization (Paulus et al., 2018).
We provide details about model implementations and references in Appendix for reproducibility.

Results
We compare our proposed model SEG-Net with the baselines on the scientific and web domain datasets. We present the experiment results in Table 2 Table 3 in Appendix). This is due to the extractive nature of SEG-Net, and documents having more closely related keyphrases (e.g., SEG-Net predicts ground-truth  Table 5: Ablation on SEG-Net without decoupling extraction and generation (DEG), salient sentence selection (SSS), and layer-wise coverage attention (LCA). We preclude one design choice at a time.
keyphrases: "Google", "Apple" with other relevant keyphrases: "line", "Amazon.com". See qualitative examples provided in Appendix). Therefore, we suggest future works to consider the dataset nature while judging models in this respect.

Analysis
The differences between the Transformer baseline and SEG-Net are (1) decoupling keyphrase extraction and generation, (2) use salient sentences for keyphrase prediction, and (3) layer-wise coverage attention. We perform ablation on the three design choices and present the results in Table 5.
Decoupling extraction and generation SEG-Net extracts present keyphrases and generates absent keyphrases as suggested in Chen et al. (2019a) with a difference in the extractor. SEG-Net employs a 3-way classifier (to predict BIO tags) that enables consecutive present keyphrases extraction. The ablation study shows that separating extraction and generation boosts present keyphrase prediction (as much as 2.8, 1.0, and 2.2 F1@M points in NUS, SemEval, and KPTimes datasets, respectively).
Salient sentence selection One of SEG-Net's key contributions is the sentence-selector that identifies the salient sentences to minimize the risk of missing critical points due to truncating long target documents (e.g., web documents). The contribution of sentence-selector in present keyphrase prediction is evident from the ablation study. The impact of using salient sentences to generate absent keyphrases is significant for the web domain datasets (e.g., 2.8 F1@M points in KPTimes). We show the performances on KPTimes test documents with different length in Figure 3 and the results suggest that SEG-Net improves absent keyphrase prediction significantly for longer documents, and we credit this to the sentence selector. The selector's accuracy on the KP20k and KPTimes datasets are 78.2 and 73.7 in terms of (macro) F1 score. We evaluate SEG-Net by providing the groundtruth salient sentences to quantify the improvement achievable with a perfect sentence-selector. We found that the present keyphrase prediction performance would have increased by 3.2 and 4.1 F1@M points with a perfect sentence-selector. We compare the sentence selector with the baselines that select the first N sentences from the target document, and the results are presented in Table  6. SEG-Net's selector has a higher precision that indicates it processes input texts with more salient sentences. On the other hand, the recall is substantially lower for the scientific domain due to falsenegative predictions. Our experiments suggest that salient sentence selection positively impacts and has additional room for improvement.  Table 6: Precision and recall computed by selecting N predicted salient sentences in SEG-Net, and the first N sentences from the target documents in the baselines. We set N for each target document so that the total length of the selected sentences does not exceed a limit of 200 words. It is important to note that the baseline recall is close to 100.0 for the scientific domain datasets because the average length of the target documents from that domain is closer to 200 words.
Layer-wise coverage attention The ablation study shows the positive impact of the layer-wise coverage attention in SEG-Net. The improvement in absent keyphrase generation for the KPTimes dataset (3.0 F1@M points) is significant, while it is relatively small in other experiment datasets. We hypothesize that the coverage attention helps when keyphrases summarize concepts expressed in different segments of a long document. We confirm our hypothesis by observing the performance trend with and without the coverage attention mechanism (we observe a similar trend as in Figure 3). We provide additional experiment results and qualitative examples in Appendix.

Related Work
Keyphrase extraction approaches identify important phrases that appear in a document. The existing approaches generally work in two steps. First, they select a set of candidate keyphrases based on heuristic rules (Hulth, 2003;Medelyan et al., 2008;Liu et al., 2011;Wang et al., 2016). The selected keyphrases are scored as per their importance in the second step, which is computed by unsupervised ranking approaches (Wan and Xiao, 2008;Grineva et al., 2009) or supervised learning algorithms (Hulth, 2003;Witten et al., 2005;Medelyan et al., 2009;Nguyen and Kan, 2007;Lopez and Romary, 2010). Finally, the top-ranked candidates are returned as the keyphrases. Another pool of extractive solutions follows a sequence tagging ap-proach (Luan et al., 2017;Zhang et al., 2016;Gollapalli et al., 2017;Gollapalli and Caragea, 2014). However, the extractive solutions are only able to predict the keyphrases that appear in the document and thus fail to predict the absent keyphrases.
Keyphrase generation methods aim at predicting both the present and absent phrases. Meng et al.

Conclusion
This paper presents SEG-Net, a keyphrase generation model that identifies the salient sentences in a target document to utilize maximal information for keyphrase prediction. In SEG-Net, we incorporate a novel layer-wise coverage attention to cover all the critical points in a document and diversify the present and absent keyphrases. We evaluate SEG-Net on seven benchmarks from scientific and web documents, and the experiment results demonstrate SEG-Net's effectiveness over the state-of-the-art methods on both domains.

A Additional Ablation Study
Variation of named entities A keyphrase can be expressed in different ways, such as "solid state drive" as "ssd" or "electronic commerce" as "e commerce" etc. A model should receive credit if it generates any of those variations. Hence, Chan et al.
(2019) aggregated name variations of the groundtruth keyphrases from the KP20k evaluation dataset using the Wikipedia knowledge base. We evaluate our model on that enriched evaluation set, and the experimental results are listed in Table 7. We observed that although SEG-Net extracts the present keyphrases, it can predict present phrases with variations such as "support vector machine" and "svm" if they co-exist in the target document.   sented in Table 8. The character embeddings are employed as we limit the vocabulary to the most frequent V words. During our preliminary experiment, we observed that character embeddings have a notable impact in the web domain, where the actual vocabulary size can be large. The addition of segment embedding is also helpful, specially the sentence-selector may predict salient sentences from any part of the document. We hypothesize that the sentence index guides the self-attention mechanism in the extractor-generator.

Impact of embedding features
Fine-tuning via Reinforcement Learning Following Chan et al. (2019), we apply reinforcement learning (RL) to fine-tune the extractor-generator module of SEG-Net on absent keyphrase generation. As we can see from   We reported F1@5 and F1@M scores in this work, where M denotes the number of predicted keyphrases. We also compute F1@10 and F1@O, where O represents the number of ground truth keyphrases, and the results are presented in Table  12. Many prior works have reported R@10 and R@50 for absent phrase generation. To compute R@50, we need to perform beam decoding to generate many keyphrases, typically more than 200 (Yuan et al., 2020). In our opinion, generating hundreds of keyphrases from a document does not truly reflect the models' ability in understanding document semantic. Therefore, we do not prefer to assess models' ability in terms of R@50 metrics.

C Qualitative Analysis
We provide a few qualitative examples in Figure 4.

D Reproducibility References
• We train and test the first four baseline models using their public implementation. We use the Transformer implementation from OpenNMT for catSeq (Transformer) and SEG-Net.
• We adopt the implementation of paired bootstrap test script to perform significance test.
• The preprocessed scientific article datasets are available here.
• KPTimes dataset is available here.
Title: smart speakers powered by voice agents seen ushering in era of ai Article: major tech firms have been keen to sell speakers equipped with voice -based artificial intelligence agents recently .
[EOS] the debuts of smart speakers are seen as the prelude to an ai era , ushering in a new technological age in which virtual assistants are expected to become as ubiquitous as smartphones , allowing people to connect to the internet by voice with greater ease .
[EOS] whether these speakers will really take off and whether the technology will be popular in japan remain to be seen .
[EOS] the following questions and answers explore these issues as well as why ai speakers are creating a buzz and what will be the role of japanese firms in this field .
[EOS] what makes ai speakers special ? they look like normal portable home speakers , but one big difference is that they communicate with users verbally .
[EOS] users can tell the speakers to play music , search the internet , pull up weather forecasts , send text messages , make phone calls and perform other daily tasks . Article: the health ministry has confirmed the first domestic dengue fever case in japan in nearly 70 years .
[EOS] a saitama prefecture teen girl was found wednesday to have contracted the virus through a mosquito in japan , followed by news that two more people -a man and a woman in tokyo -have also been infected .
[EOS] more than 200 dengue cases are reported in japan each year , but those are of patients who contracted dengue virus abroad .
[EOS] the world health organization estimates the number of infections across the globe to be 50 million to 100 million per year .
[EOS] while the news has led to widespread fears that a pandemic outbreak might have arrived , experts are quick to deny such a scenario , while offering some advice on what measures people can take to minimize their exposure . [EOS] following are some basic questions and answers regarding the infectious disease and measures that can be taken to prevent infection .
[EOS] what is dengue fever and what causes it ? dengue fever is a tropical viral disease , also known as dengue hemorrhagic fever or break -bone fever , ... (truncated) [catSeq] dengue fever ; japan; medicine and health [SEG-Net] dengue fever ; japan ; dengue ; dengue virus ; health organization ; mosquitoes ; vaccines immunization [Ground-truth] dengue fever ; world health organization ; dengue virus ; infectious diseases Title: photo report : foodex japan 2013 Article: foodex is the largest trade exhibition for food and drinks in asia , with about 70,000 visitors checking out the products presented by hundreds of participating companies . [EOS] i was lucky to enter as press ; otherwise , visitors must be affiliated with the food industry -and pay ¥ 5,000 -to enter .
[EOS] the foodex menu is global , including everything from cherry beer from germany and premium mexican tequila to top -class french and chinese dumplings .
[EOS] the event was a rare chance to try out both well -known and exotic foods and even see professionals making them .
[EOS] in addition to booths offering traditional japanese favorites such as udon and maguro sashimi , there were plenty of innovative twists , such as dorayaki , a sweet snack made of two pancakes and a red -bean filling , that came in coffee and tomato flavors .