An Integrated Search System for Korea Weather Data

We introduce W EATHER S EARCH , an integrated search system deployed at the Korea Meteorological Administration (KMA) 1 . W EATH - ER S EARCH enables users to retrieve all the relevant data for weather forecasting from a massive weather database with simple natu-ral language queries. We carefully design and conduct multiple expert surveys and interviews for template creation and apply data augmentation techniques including template filling to collect 4 million data points with minimal human labors. We then finetune mT5 (Xue et al., 2021) on the collected dataset and achieve an average MRR of 0.66 and an average Recall of 0.82. We also discuss weather-data-specific characteristics that should be taken into account for creating such a system. We hope our paper serves as a simple and effective guideline for those designing similar systems in other regions of the world.


Introduction
Weather forecasting is an important task that involves predicting future weather conditions based on current and past meteorological observations.Accurate weather forecasting not only impacts our daily lives but also plays a crucial role in potentially saving lives and resources during natural disasters.In the case of South Korea, the diversity of weather phenomena (due to its three-sided coastline and approximately 70% of the land consisting of mountainous areas) increases the significance and challenges of weather forecasting.
Meteorological experts rely on two main types of data sources for weather forecasting.The first is the Comprehensive Meteorological Information System (COMIS), which provides access to radar images, cloud images, satellite imagery, and other relevant data.COMIS has a structure similar to a typical website, featuring a hierarchical tree structure with select boxes, drop-down menus, and 1 https://www.kma.go.kr/neng/index.do

WeatherSearch
On which day did Gangwon-do in this year's summer have the highest hourly precipitation?other interactive elements.Navigating through this structure often requires multiple clicks to arrive at the final target information, and it can be time-consuming, especially when the exact location of the data is unknown.As a result, most experts tend to stick to the pages they are familiar with and rarely explore other pages and this limitation hinders the effective utilization of diverse weather data.The second data source is the Korea Meteorological Administration (KMA) database, which stores past weather observation data such as temperature, humidity, wind speed, and precipitation.The KMA database contains an incredibly extensive volume of data.Its one-minute interval data for 600 unique stations makes 900,000 data points per day, covering roughly 40 years since the 1980s.To retrieve the data from such a massive database, proficiency in using SQL queries is necessary.However, many meteorological experts, particularly newcomers unfamiliar with weather database and senior professionals who struggle with programming languages, face difficulties in using SQL queries.Consequently, they result in spending a significant amount of unnecessary time constructing queries or resort to using only basic SQL queries.
In this paper, we introduce WEATHERSEARCH, an integrated search system that allows the users to access to all the necessary weather data from the COMIS and the database through natural language queries.To the best of our knowledge, there are currently no existing search models specifically tailored to the meteorological field and the corresponding training data.Our focus has been on constructing datasets that actively incorporate the opinions of industry experts.Subsequently, using the constructed datasets, we fine-tune a pretrained mT5 model (Xue et al., 2021) to map each natural language query to a structured form.
We construct two domain-specific datasets: (1) a natural language query-SQL query dataset and (2) a keyword-URL dataset.To collect the SQL dataset, we conduct multiple expert surveys and interviews targeting 24 experts to gather responses.Based on these responses, we manually create question templates and corresponding SQL query templates.Subsequently, various techniques, including template filling (Lee et al., 2023), are applied to cover the entire scope of the database, resulting in the final dataset.The URL dataset is collected by crawling all possible URLs from the COMIS and tagging them with corresponding keywords.We preprocess the keywords to make them similar to the actual search keywords by applying useful techniques.
Through the deployment of WEATHERSEARCH to real-world meteorological experts, we anticipate the following contributions: 1. We propose an effective development pipeline for the search system that works with a vast amount of real structured data and incorporates expert opinions in weather domain.
2. Our system enables weather experts to leverage wide range of data during weather forcasting, allowing them to work more efficiently.
Through the disclosure of our methodology, we hope to offer support to those seeking to create similar systems for different regions or languages.

Related Work
Machine Learning for Weather Machine learning techniques have been increasingly applied in the meteorological domain.et al., 2018), Spider (Yu et al., 2018), CoSQL (Yu et al., 2019), and UNITE (Lan et al., 2023), which have demonstrated advancements in accurately generating SQL queries from natural language inputs.On the other hand, in the code generation, Alpha-Code (Li et al., 2022), Synchromesh (Poesia et al., 2022), and CodeRL (Le et al., 2022) have emerged as prominent approaches, showcasing their ability to transform natural language instructions into executable code.We apply semantic parsing to the generation of SQL queries and URLs (structured representation) in the weather domain.

Data Collection
We construct domain-specific datasets necessary to train the search system.As mentioned in §Section 1, weather data comes from two different sources, and each source has its own structured query for accessing data.For this reason, we build separate datasets for each source.One is the SQL dataset, which comprises pairs of natural language queries and corresponding SQL queries ( §Section 3.1).The other is the URL dataset, which consists of pairs of natural language keywords and corresponding URLs ( §Section 3.2).

SQL Dataset
SQL data collection method is based on expert surveys and interviews to ensure that the constructed dataset can be applied and closely utilized in the real world.Indeed, gathering all possible natural language queries that experts may use through surveys is inefficient and impractical.Instead, we collect responses through surveys and transform them into templates, which are then filled in accordingly (Lee et al., 2023).Figure 2 illustrates SQL data collection process.The iterative template  collection process is conducted in a total of five stages: (1) expert survey, (2) template creation, (3) template augmentation, (4) expert interview, and (5) modification.It takes approximately two weeks to complete one iteration.We repeat this four times to ensure high quality.
Expert Survey Expert surveys are conducted targeting 24 experts from the Forecasting Department of the Daejeon Regional Meteorological Administration.On average, we obtain approximately 60 responses per survey, and the collected responses comprise weather-related natural language queries (e.g.How many days did it snow in the capital area last winter?)commonly used by experts in their practical situation, including variables (e.g.snow, capital area, last winter) within the queries.The collected queries encompass a wide range of difficulties, from simple questions that find a single climatic factor, to complex questions (i.e.requiring SQL table JOIN) that involve multiple regions or multiple climatic factors, and even complex inquiries that necessitate specific conditions to be satisfied.The examples of survey responses can be found in Appendix A.
Template Creation Following the survey, threetype templates are created based on the collected responses: (1) question template, (2) SQL template, and (3) time template.Figure 3 depicts the procedure of making question templates and SQL templates.First, we represent the query templates by using placeholders (e.g.{date}, {region}) for the words that can be replaced with variables from survey responses.The details of range for each variable can be found in §Section 3.1.Once the question templates are completed, corresponding SQL templates are created, which also include variables (e.g.{date_sql}, {region_sql}) corresponding to the variables in the question templates.In the end, we construct a total of 117 question-SQL template pairs.Time expression is crucial in querying weather data as it exhibits significant diversity.For this reason, a separate time template is created to handle time expressions.Time expressions are categorized into daily, monthly, yearly, ordinal, and seasonal representations, considering their combinations.Additionally, colloquial date expressions (e.g.last, previous year, yesterday) are also included.As a result, a total of 755 time templates are obtained.All three templates are constructed and annotated manually by the authors of this paper.Appendix B provides the samples of the time templates.
Template Augmentation To accommodate the diversity of language, we employ instruction-tuned language models for template augmentation.As the survey is conducted exclusively with a small number of experts, the data might have biases (i.e.being influenced by their specific linguistic tendencies) and potentially limiting the usage of diverse vocabulary.Hence, we leverage language models such as ChatGPT2 to augment the templates, aiming to encompass vocabulary that would be used by a broader range of individuals.
Expert Interview & Modification Upon completion of template creation and augmentation, it is essential to undergo a review process involving experts.Six experts from the Forecasting Department of the Daejeon Regional Meteorological Administration are interviewed to review the data and provide feedback.The focus of the review is to examine whether the template content aligns with the actual queries used and if the variable ranges are correctly set.Following the interviews, the templates are modified based on the feedback received.This entire process constitutes one iteration.Subsequently, a new round of surveys is conducted to collect data, create templates, and receive feedback, repeating the iterative process.
Template Filling Once template collection is complete, the next step involves populating the variables within the templates with values to generate actual training data.The key aspect of this process is to fill in values that cover the entire range of the variables in the standardized templates, creating training data that closely resembles the real world and covers all possible questions.
Before proceeding with template filling, we have to explore the variables that need to be filled and their respective ranges.There are five main variables that require population: {date}, {region}, {attribute}, {extreme_expression}, and {value}.Here is a detailed description of each variable: • {date}: This variable represents the specific date or time period for which the query is being made.It is populated with one of the time templates mentioned earlier in the paper.
• {region}: This variable indicates the geographical area or location of interest for the query.In Korea, there are originally 728 observation stations nationwide.However, we categorize them into a total of 183 regions by grouping them at the provincial level (e.g.Gyeongju-si, Cheorwon-gun) and also at the metropolitan area level (e.g.Gyeongsangnamdo, Jeollabuk-do).
• {attribute}: This variable pertains to the specific climatic factor or meteorological parameter that is being queried, such as temperature, humidity, precipitation, wind speed, and snowfall amount.There are a total of 30 climatic factors.
• {extreme_expression}: This variable accounts for expressions related to extreme conditions, such as "highest" and "lowest".It covaries with the {attribute} variable, showing a significant influence.
• {value}: This variable represents the actual value or range of values associated with the {attribute} being queried.It is usually a numerical value.
Taking the characteristics and ranges of these variables into consideration, we will now proceed to the template filling stage to actually populate the values.
Show me the maximum value of {attribute2} when {attribute1} in {region} has been equal to or exceeded {value} {date}.
Show me the maximum value of humidity when daily precipitation in Gangneung has been equal to or exceeded 100mm {date}.

Question Template
Show me the maximum value of humidity when daily precipitation in Gangneung has been equal to or exceeded 100mm over the past {year} years.
Show me the maximum value of humidity when daily precipitation in Gangneung has been equal to or exceeded 100mm over the past 10 years.Template filling begins with the question template.Firstly, random values are assigned to variables other than {date} in the question template.Next, a random time template is selected from the time templates and inserted into the {date} variable.Finally, if there are variables in the selected time template, numerical values that meet the variable conditions are inserted (see Figure 4).In the case of the {value} variable and variables within the time template, it is necessary to ensure that the assigned values fall within the specified ranges.For example, if there is a {month} variable within the time template, it should be restricted to numbers between 1 and 12.In order to account for the variation in the amount of data that can be generated through template filling, we apply different weights to each template.These weights are determined based on factors such as the number of variables and the range of values within each template.The amount of data generated from a single template is then proportional to its assigned weight.

All variables except
Data Augmentation Although language diversity is ensured through template augmentation, data augmentation is also necessary for the values that go into the placeholders.There exist synonyms for meteorological terminology.If a general model does not encounter these words during training, it may not recognize them as synonyms.To address this, we create a synonym dictionary for meteorological terminology and apply it to the training data.We construct synonyms for a total of 197 words, mainly focusing on weather elements and regions (e.g.Precipitation = Rainfall = Water accumulation, Gwangju Metropolitan City = Gwangju Jeollado).

URL Dataset
Similarly to the SQL dataset collection process, the URL dataset also incorporates an extensive amount of expert opinions.However, in the case of the URL dataset, the template collection process is omitted due to the availability of pre-collected keyword-URL pairs.Instead, the data preprocessing for the URL dataset is iteratively refined through expert feedback and modifications, aiming to enhance its quality and relevance (see Figure 5).
We crawl and collect all possible URLs within the COMIS website and annotate the collected URLs with matching keywords.These collected URL-keyword pairs undergo several preprocessing steps based on expert feedback.Step (1), we add noise to the keyword data in the training set to ensure robust performance even with changes in keyword order or partial keyword omissions.Step (2), we apply a synonym dictionary to augment the data, ensuring proper functioning with synonyms of meteorological terminology.We collect synonyms for a total of 196 words (see Appendix C).Step (3), we incorporate new feedback from weather experts (users) for further improvements.Step (4), we go back to Step (1) and repeat the process again.We repeat the process twice, but the number of iterations can be increased for better alignment with the users.

Data Statistics
In our study, we collect two distinct datasets: URL dataset (keywords-URL pairs) and SQL dataset (NL-SQL pairs).These datasets contain a comprehensive range of weather-related information, showcasing their ability to capture diverse and extensive elements (see Table 1).We align the sizes of the URL dataset and the SQL dataset, aiming for a balanced distribution of data between the two.The URL dataset contains a total of 2,083,334 records and includes weather imagery and images related to radar lightning, marine conditions, and more.Users can access a variety of climate-related visuals and images through this dataset.The SQL dataset consists of a total of 1,983,800 records and allows for querying observation measurements such as temperature, humidity, and other variables from the database.Due to the ability to perform table JOIN in SQL queries ( data involving JOIN accounts for approximately one-SQL URL EM MRR Recall @5 @10 @20 Avg.@5 @10 @20 Avg.mT5 0.99 0.63 0.64 0.65 0.64 0.75 0.82 0.86 0.81 mT5 w/ C.D. 0.99 0.65 0.66 0.67 0.66 0.76 0.83 0.88 0.82 Table 2: Results of WEATHERSEARCH.Avg.represents the average of @5, @10, and @20 scores and w/ C.D. indicates "with constrained decoding" third of the entire SQL dataset), it is possible to access multiple elements simultaneously.As a result, the sum of records for individual elements may differ from the total dataset count.

Experiment
Model We use mT5 (Xue et al., 2021) as the base model for our WEATHERSEARCH system.mT5 is pre-trained on a massive corpus of multilingual text data, and possesses the capability to encode and decode both Korean and English within its outputs.In addition, to further improve accuracy, we incorporate constrained decoding following De Cao et al. (2021).Constrained decoding, utilizing a prefix tree, involves guiding the generation process during natural language generation tasks by constraining the output based on predefined rules represented in the form of a tree-like data structure.This technique ensures that the generated outputs adhere to specific patterns or formats, improving the quality and coherence of the generated text.
Evaluation Metrics For SQL query generation, we employ the Exact Match (EM) metric.On the other hand, for URL search, there may be multiple possible answers, similarly to the evaluation of canonical web search.For this reason, we use Mean Reciprocal Rank (MRR) and Recall metrics which are widely used in the search engine evaluation.
relevant recommended items all the possible relevant items where |Q| is the number of queries, rank i is the rank of correct answer, and k represents the number of outputs generated from a single query.Both MRR and recall provide valuable insights about the retrieval quality and user experience.MRR emphasizes the ranking quality of the retrieved results, giving more weight to the top-ranked items.Recall, on the other hand, focuses on the system's ability to retrieve all relevant items, ensuring comprehensive coverage.

Results
Table 2 shows the experimental results of WEATHERSEARCH on the SQL evaluation dataset and URL evaluation dataset, respectively.the experimental results.In the experiments conducted on the SQL evaluation dataset, both with and without constrained decoding, the system exhibits saturated performance with an exact match score of 0.99.When evaluating on the URL dataset, the experimental results show an average MRR of 0.64 and an average Recall of 0.81.With the addition of constrained decoding, the performance improves to MRR 0.66 and Recall 0.82, respectively.The lower performance compared to the SQL evaluation dataset is expected due to the presence of noise (i.e., some of the input keywords being missing or their order being changed) in the evaluation data.Yet, in search systems, it is more crucial to have a robust functionality even when some parts of the search query are missing.Therefore, it is important to also consider the setting with noisy.

Conclusion
We introduces WEATHERSEARCH, a model specifically designed for searching on Korean weather data.To develop the model, we construct two datasets tailored to the weather domain and finetune mT5 on the two datasets.These datasets effectively incorporate the expertise and feedback from the weather experts, making the model directly applicable to real-world scenarios.The experimental results demonstrate sufficiently accurate performance for deployment across various metrics.Future work includes conducting human evaluation for qualitative assessment of the system.WEATHERSEARCH is expected to provide valuable assistance in accessing and utilizing weather data to the weather experts, ultimately improving decisionmaking process and productivity for weather forecasting.

Limitations
Since we generate datasets based on templates, there may be grammatical errors or inconsistencies as mismatches in prepositions and postpositions.
Although this issue can be resolved by manually editing the data later, for now, it is not changed since it does not significantly impact the model's performance during training.
The current evaluation datasets used in the experiments are created in a similar manner to the training datasets, which might have resulted in relatively favorable results.However, we anticipate that there could be a gap between these results and the actual user experience in real-world scenarios.To bridge this gap, we need to collect new evaluation data comprising actual search queries used by users and conduct human (weather expert) evaluations to further refine and validate the system's performance.
Due to security concerns regarding national data, we cannot publicly disclose the original training data.However, in the future, there is a possibility of releasing the data through data masking techniques.By applying data masking, we can enhance the security of the data while preserving its original characteristics, allowing for potential public release while safeguarding sensitive information.

Figure 3 :
Figure 3: The procedure of making question template and SQL template.
{date} are filled with values.{date} variable is filled with time template.The remaining numerical values are filled.

Figure 4 :
Figure 4: The procedure of template filling.

Figure 5 :
Figure 5: Process of collecting URL dataset.

Table 1 :
Composition of weather elements in URL and SQL dataset.