AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets

High-quality data is essential for conversational recommendation systems and serves as the cornerstone of the network architecture development and training strategy design. Existing works contribute heavy human efforts to manually labeling or designing and extending recommender dialogue templates. However, they suffer from (i) the limited number of human annotators results in that datasets can hardly capture rich and large-scale cases in the real world, (ii) the limited experience and knowledge of annotators account for the uninformative corpus and inappropriate recommendations. In this paper, we propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues through a data2text generation process, where unstructured recommendation conversations are generated from structured graphs based on user-item information from the real world. In doing so, we comprehensively exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets. Extensive experiments validate the benefit brought by the automatically synthesized data under low-resource scenarios and demonstrate the promising potential to facilitate the development of a more effective conversational recommendation system.


Introduction
Conversational recommendation (CR) systems aim to recommend potential items of interest for users (or seekers) through dialogue-based interactions.Although tremendous works have been contributed to the CR domain, the lack of both large-scale and high-quality training data remains a common problem due to the great cost and difficulty in dataset Figure 1: The proposed approach takes three kinds of sources, namely user-item matrices, knowledge graphs, and existing conversational recommendation datasets, to automatically generate recommendational dialogues.construction.A classic recommendation dialogue collection (Li et al., 2018) relies on a human recommender to chat with a randomly paired seeker and supply some recommendations within several conversation turns usually based on the chatting content.The dataset constructed under this paradigm is not only limited in scale but also can hardly ensure the recommendation quality.Specifically, it suffers from: (i) the limited number of human annotators that results in datasets can hardly capture rich and large-scale cases in the real world, (ii) the limited experience and knowledge of annotators account for the uninformative corpus and inappropriate recommendations.In addition, the preference given by annotators to the recommended item may be "unreal" when (s)he is unfamiliar with it but cannot timely validate the annotation.The performance of a CRS trained with such datasets may be barely satisfactory when applied in real-world scenarios.
Although there exist numerous recommendation data that contain more "real-world" user preferences, e.g., MovieLens (Harper and Konstan, 2015), there are little or even no corresponding dialogues, which leads to a low-resource scenario for CRS training.Therefore, we propose a novel CR data synthesis approach, AUGUST, which is an AUtomatic Generation UnderSTudy for conversational recommendation datasets.The core of our approach is synthesizing the strengths from three kinds of data resources: (i) user-item ratings from websites that can provide items really favored by each user; (ii) external knowledge that can provide rich item-related information leading to a "professional" recommender; (iii) abundant dialogue corpus that can help develop the learning model's conversation ability.Note that all three can be easily accessed thus facilitating the potential of generating large-scale diverse recommendational dialogues.In doing so, our approach contains two steps: (1) to form one data sample, seamlessly selecting some items rated by one user, from which a graph is constructed that contains the items, related entities, and their relations based on a welldeveloped knowledge graph (KG); (2) adopting a Data2Text generator (Li et al., 2021) to convert the item graph into a fluent and natural dialogue around the items.Such a graph-based dialogue generation manner is endowed with great extensibility and explainability where external knowledge can be integrated via expanding the intermediate graph with related entities from KG.To train the Data2Text module, we make use of recommendational dialogues from existing CR datasets to learn a dialogue generator.Specifically, we elicit graphs from dialogues as ones from user-item ratings, and train the Data2Text generator to take the graph as input to recover the original dialogue.
We conduct extensive experiments on the synthesized data quality and the performance of Data2Text generation, and give a detailed analysis of problems in the synthesis process.We also empirically validate the benefit of synthesized data in helping learn a stronger CRS, especially on recommendation accuracy in the low-resource scenario.Along with the rapid development of Data2Text generation methods, the proposed AUGUST is of great potential and provides a new solution to construct large-scale CR datasets, which is our main contribution.In addition, it is expected to attract more attention to the direction of automatic dataset generation, and facilitate the data-driven learning models designed for not only CR but also other various tasks in the future.

Conversational Recommendation Dataset
Recently, Conversational Recommendation Systems (CRS) (Li et al., 2018;Chen et al., 2019;Jannach et al., 2021;Lu et al., 2021) have become an emerging research topic, which aims to provide high-quality recommendations to users through natural language.To facilitate the study of this task, some works collect human-human and humanmachine conversation data by asking human annotators to conversate under certain rules.Gao et al. point that existing datasets are not qualified to develop CRS that satisfies industrial application requirements for two reasons: 1) the scale of these datasets is not enough to cover the real-world entities and concepts; 2) the datasets constructed under certain rigorous constraints can hardly generalize to the complex and diverse real-world conversation.Therefore, more efforts are encouraged to develop large-scale, generalizable, and natural datasets for CRS.

Data2Text Generation
Data2Text Natural Language Generation (NLG) is the computational process of generating meaningful and coherent natural language text to describe non-linguistic input data.The input can be in various forms such as databases of records, spreadsheets, knowledge bases, and simulations of physical systems.Traditional methods for Data2Text generation (Reiter and Dale, 2000) implement a pipeline of modules including content planning, sentence planning, and surface realization.With the rapid development of Seq2Seq models especially pre-trained models, recent neural generation systems (Li et al., 2021) trained in an end-to-end fashion get state-of-the-art results on Data2Text benchmarks such as WebNLG (Gardent et al., 2017), ToTTo (Parikh et al., 2020), and AGENDA (Koncel-Kedziorski et al., 2019).One of the most popular subtasks, Graph2Text, aims to create fluent natural language text to describe an input graph.Early works mainly center around statistical methods, applying grammar rule to generate text (Konstas and Lapata, 2013).Recently, neuralnetwork-based approaches have been proposed to generate text from linearized KG triples (Ferreira et al., 2019), some of which investigate how to encode the graph structural information using Graph Neural Networks (GNNs) (Scarselli et al., 2008) and Transformer (Koncel-Kedziorski et al., 2019) explicitly.Unsupervised methods (Guo et al., 2020) and few-shot problems (Li et al., 2021) are also explored.In our approach, we adopt a Graph2Text generator for CR data synthesis.

Preliminaries
Our CR dataset synthesis approach produces recommendational dialogues from three kinds of resources: user-item matrices from traditional recommendation datasets, external knowledge graphs, and existing CR datasets.We first introduce related notations.A user-item matrix (UIM) M (supplied by datasets like MoviLens (Harper and Konstan, 2015)) consists of N rows and M columns, of which the i-th row represents the ratings of the i-th user U i towards all M items, and each element , 2, 3, 4, 5] represents the i-th user's rating score towards the j-th item o j , where a higher score represents the user's more favor to one item.Note that the matrix M may be sparse depending on the number of ratings given by each user.
A knowledge graph G =< E, R >, e.g.DBpedia (Auer et al., 2007), where E and R are the entity and relation set, respectively.The graph consists of large amounts of entity-relation-entity triples (e i , r ij , e j ), of which e i or e j can be an item or non-item entity from E and r ij ∈ R represents the relation category between an associated entity pair.We denote the item entity set as O ⊂ E, which contains all recommendation candidates.In a CR dataset, e.g. the ReDial dataset (Liu et al., 2020), a conversation is generated for recommendations on a certain domain (movie, traveling, or restaurant, etc.) in a seek-recommender pair.Denote the i-th conversation as C i , a seeker/user U i is asking for item recommendations from a recommender R i .In the following chatting turns, U i may express his/her preferences explicitly or implicitly, then R i is expected to capture the user's preferences accord-ing to the historical dialogue context, denoted as 1 , where t is the historical turn number and c j is the j-th conversation utterance.

Dataset Synthesis
The proposed dataset synthesis approach starts from real-world user preferences information easily accessed from the UIM M. Then a UIM→Graph→Dialogue generation pipeline is adopted to synthesize recommendational dialogues, with the overview shown in Fig. 2.
UIM → Graph The first step is to convert UIM that contains user preferences into graphs.From any row i of M, a set of items with respective ratings {(o j , s ij )} can be taken to generate a dialogue sample.All o j are used as nodes to construct the graph G ′ i .To integrate the user preferences into G ′ i , an extra node of user u i with its relation to each item node is added to constitute triples like (u i , s ij , o j ) for item o j .Furthermore, we extend G ′ i by incorporating rich external knowledge from G for the informativeness of the final dialogue output.Specifically, for each two items o j and o k , we search for a two-hop path in G to find their relations, i.e., two movies are directly linked (neighbouring) as (o j , r jk , o k ) (e.g.belong to one movie series) or linked by one entity e l as (o j , r jl , e l , r lk , o k ) (e.g.sharing the same director, actors, or genre).Then, these triples in the searched paths are added into G ′ i .The obtained graph G ′ i can better represent the selected items from UIM data by incorporating both accurate user-preference information and knowledge-equipped inter-entity relations.
Graph → Dialogue Given a graph G ′ i that represents the items expected to appear in the dialogue, a Data2Text generator aims to synthesize a conversational dialogue C i based on the graph.We cast it as a Data2Text problem.We adopt a Data2Text generator to take the graph as input, and output raw text that contains the vertex and edge information in the graph.Note that two tokens [U] (user) and [R] (responder) are specially defined to be generated in the text, such that the sentences after [U] ([R]) and before the next token [R] ([U]) can be viewed as a single turn.In this way, the text can be decomposed and re-organized into a multiturn dialogue.Considering there is no supervision (graph-dialogue data pair) for the learning of the generator in this Data2Text process, we utilize the conversation corpus in existing CR datasets to learn a strong generator for dialogue synthesis, which is introduced in the following subsection.

Data2Text Generation
In order to generate both natural and logical dialogues from item-related graphs, we adopt a Data2Text generator to learn the conversation knowledge in existing CR datasets for Graph → Dialogue generation.As illustrated in Fig. 3, an encoder-decoder architecture is implemented with an R-GCN encoder (Schlichtkrull et al., 2018) for graph feature extraction, and a pre-trained language model (PLM) (Lewis et al., 2020) decoder for dialogue generation.
Graph Construction and Encoding Given any dialogue sample C i in existing CR datasets, we construct a graph G ′ i to produce a graph-dialogue training pair for learning a strong Data2Text generator.To construct G ′ i from C i , we first search for all entities {e j } with the speaker's (U i or R i ) sentiment {s ij } to them (provided by CR datasets usually or generated from an estimator), and link each e j with corresponding nodes in G. Then a graph G ′ i can be constructed in a similar way as in the UIM → Graph process described in Sec.3.2.Given a constructed G ′ i , an R-GCN (Schlichtkrull et al., 2018) is applied as the encoder to generate entity embeddings for G ′ i .Let ϕ j ∈ R d denote the entity embedding for a general entity e j in KG, where d is the embedding size.Then the R-GCN helps leverage the multi-relational information to have a structure-aware graph representation.Specifically, the embedding of e j at the l + 1-th of total L layers can be computed as: where σ(•) is the activation function, W l r and W l 0 are trainable parameters, and N r j is the set of neighbouring entities of e j under relation r.Note that, all ϕ 0 j before the first layer are initialized by pretrained KG embeddings in (Yang et al., 2014).The entity embeddings {ϕ L j } output by the last R-GCN layer are re-denoted as {ϕ j } for simplification.
Graph Feature Learning To learn higherquality graph features for more smooth decoding, we leverage another encoding branch of a pretrained language model (PLM) to learn contextaware node features and align ones encoded from graphs with them.Specifically, by taking the whole dialogue as PLM input, entities are represented with contextual information in natural utterances, so that rich knowledge in PLM can be adapted.Denote the context-aware entity embedding output by the PLM branch as φj ∈ R d , which has the same dimension as the R-GCN embedding.The alignment between two types of entity feature vectors is implemented by minimizing an l 2 loss, denoted as L align : Before feeding graph node features into the decoder, we linearize them into an entity sequence {ϕ j } through a relation-biased breadth-first search (RBFS) strategy following (Li et al., 2021), where a breadth-first search is adapted and an RBFS weight α j is computed for each node e j as its score to decide the order in each search level: , where e i is the parent node of e j in the search process.In the same search level, the node with a higher RBFS score has a higher order in the sequence.For more related implementation details, please refer to (Li et al., 2021).
Dialogue Decoding In the decoding stage, a PLM decoder is performed to decode the linearized graph features {ϕ j } into textual dialogues.To formalize the dialogue generation into a typical natural language generation problem, we sequentially connect all utterances into a single paragraph but with special tokens as the separation for regrouping into dialogue turns.Denote the k-th of total K tokens as w k , the generation objective is to minimize the negative log-likelihood as: where P (•) denotes the probability function.To encourage covering entities from the input graph, a copy mechanism implemented with a pointer network is conducted, leading to a copy loss term L copy .
The overall objective function to learn the domain adaptive encoder-decoder can be written as: where λ 1 and λ 2 are weight factors to balance different loss terms, respectively.as the recommender and user to produce a conversation and cover at least 4 different movies.Every movie mentioned in the dialog is annotated explicitly.ReDial contains 10,021 conversations related to 64,362 movies and is split into training, validation, and test sets with a ratio of 8:1:1.

Experiments
(2) The MovieLens dataset (Harper and Konstan, 2015), released by GroupLens Research, describes people's expressed preferences for movies.These preferences take the form of <user, item, rating, time-stamp> tuples, where the rating (1∼5) represents the user's preference for a movie at a particular time.These preferences are collected by the MovieLens website, a recommender system that asks its users to give movie ratings for personalized movie recommendations.
(3) The DBpedia knowledge base (Auer et al., 2007) contains structured knowledge extracted from Wikipedia.It collects rich movie-related information and intermovie relations and releases an open knowledge graph available to the public.

Datasets
To validate the Data2Text generation quality of AU-GUST, we construct graph-dialogue pairs from the ReDial (Li et al., 2018) and WebNLG (Gardent et al., 2017) dataset for training and evaluation.
Considering the limitations of existing datasets as stated in Sec. 1, we create a small dataset with more "real-world" and reliable recommendations for CR evaluation.We sample 200 pieces of user-item data from MovieLens and hire some annotators to create conversations according to the user preferences for the movies, named "ML-G2D" in Tab. 1.We also provide annotators with external knowledge (e.g., movie websites) and ReDial dialogue samples as references to guarantee conversation quality.Tab. 1, we use WebNLG as the dialogue resource to train the Data2Text generator in AUGUST, and when testing on ReDial and ML-G2D, we both use ReDial as the dialogue resource.To validate the benefit of synthesized data by our AUGUST, we implement experiments to use our synthesized data as training data for the CR task.Note that to compare the benefit brought by the synthesized data and ReDial data, we randomly sample around 8,000 pieces from the synthesized data for the later training of KGSF, which keeps the same scale as ReDial training data.The synthesized data is denoted as "AUGUST" in Tab. 3 and 4.

Evaluation Metrics
To investigate the performance of various methods on the Data2Text generation task, we first conduct evaluations on the quality of conversation reconstruction.We adopt four automatic evaluation metrics widely used in Data2Text generation tasks (Li et al., 2021): BLEU (Papineni et al., 2002) and ROUGE-L (Lin, 2004), which computes the overlap ratio of n-grams between the reconstructed dialogue and the original one; CIDEr (Vedantam et al., 2015) that computes the TF-IDF weights for each n-gram in synthetic/real dialogues; and Chrf++ (Popović, 2017) that computes the average F-score on both character-level and word-level n-grams.In addition, we also compute the recall ratio (Recall) of entities to measure how many entities are recovered in the dialogue relative to the graph input.For the conversation writing quality, we compute Dist-n (Li et al., 2015) to show the distinctness of the generated utterances and the perplexity (PPL) proposed in (Jelinek et al., 1977) to measure the language fluency.Besides, we also conduct human evaluation to show the generation quality following the previous works in (Li et al., 2021;Agarwal et al., 2021), which contains three workers' ratings to 200 randomly sampled dialogues with respect to language naturalness including aspects of fluency, dialogue logic, and informativeness (5 is the full score).As for the evaluation of CRS trained on the synthesized data by AUGUST, we follow (Li et al., 2018;Chen et al., 2019;Zhou et al., 2020a) to use Recall@k (R@k, k = 1, 10, 50) as the recommendation evaluation metric, which indicates whether the predicted topk items contain the ground truth recommendation provided by human recommenders.The generation quality of CRS is evaluated on Dist-n and PPL as in the Data2Text generation task.

Implementation Details
In the step of Data2Text generation, the graph encoder in AUGUST is implemented as a two-layer R-GCN with an embedding size of 1,024.The PLM encoder for context-aware entity embedding adopts the encoder of a pre-trained BART-large (Lewis et al., 2020), which is a transformer-based model with a bidirectional encoder and an autoregressive decoder.The initial weights are provided by Hugging Face2 and are frozen in training.As for the text decoder, we employ the decoder of a BARTlarge initialized with pre-trained weights for dialogue generation.The parameters in the R-GCN encoder and BART decoder are optimized using an AdamW (Loshchilov and Hutter, 2017) optimizer with a learning rate of 10 −5 .The weight factors, λ 1 and λ 2 , are set to 0.8 and 0.8, respectively.The whole network is trained on 4×23GB NVIDIA Tesla P40 with a minibatch size of 16.ReDial AUGUST R@1 R@10 R@50 the benefit of synthesized data by AUGUST, we implement a popular CRS, KGSF (Zhou et al., 2020a), as the baseline, which incorporates two KGs, Con-ceptNet (Speer et al., 2017) and DBpedia (Auer et al., 2007), to enhance the data representations.Implementation details can be referred to in the released codes by Zhou et al. 3 .

Data2Text Evaluation
We give both automatic and human evaluations of the generation quality by AUGUST.For automatic evaluation, we implement AUGUST with BART-large as the PLM, on all three datasets to construct a benchmark for future related works.
As shown in Tab. 1, with the same training data, AUGUST performs poorer on ML-G2D than on Re-G2D, which may result from the distribution bias of ReDial data with real-world user preferences as 3 https://github.com/Lancelot39/KGSFstated in Sec. 1. Besides, the PPL values are low in all settings, so the generation has high confidence, which may result from the consistency of the generation objective between BART pre-training and Data2Text training.Performances on WebNLG are higher than on the other two over all metrics except PPL, because the target text in WebNLG is usually shorter and with richer common entities, and the input has fewer triples, which reduces the generation difficulty.Besides, we also directly compare the quality of the synthesized data by AUGUST and the ReDial data, on "Distinctness" and "Language Naturalness" in Tab. 2. We compute the Dist-2 and Dist-3 scores, and conduct human evaluation on the dialogue logic, fluency, and informativeness, which shows that the synthesized data has a high quality that is close to the ReDial data on both utterance distinctness and language naturalness.

CR Evaluation
We evaluate the CR performance of KGSF (3) ReDial and AUGUST data are complementary to provide a more rich corpus for improving the conversation capability of CRS, and adding AU-GUST data also leads to a higher recommendation accuracy than using ReDial data only.
We also evaluate the recommendation performance of KGSF on the ReDial test set when trained with ReDial or/and AUGUST data.As shown in Tab. 4, it can be seen: (i) The recommendation accuracy of KGSF is low without any training data; (ii) Adding synthesized AUGUST data can bring performance gain to get close to but lower than adding real ReDial training data; (iii) Simply adopting joint training with ReDial and AUGUST data can only obtain similar performance as using ReDial data only; (iv) Using AUGUST data as pre-training and finetuning on ReDial data can bring an extra performance gain.The results of (ii) further prove the benefit of synthesized data by AUGUST and the distribution bias between ReDial recommendations and real-world user preferences.In addition, although simply jointly using both data for training can hardly bring performance gain as in (iii) considering the distribution bias, the synthesized AUGUST data can still help improve the recommendation ability of KGSF when using AUGUST data for pre-training and finetuning on ReDial data.In this way, the AUGUST data provide a better initialization for the optimization of KGSF, and finetuning on ReDial data can guarantee the distribution consistency.It also shows the great potential of AUGUST to serve as a data synthesis approach for a better initialization of parameters in CRS.

Error Analysis
We summarize three types of errors that appeared in our generation according to the hierarchy of the dialogue requirement, with one example shown in Fig. 4. Error Type I: Format Errors, including grammar and spelling mistakes, or the unexpected writing format, e.g., each utterance is expected to start with the identity of "[U] says:", while it may generate "[U] thinks".Error Type II: Hallucination, which is a common problem in language generation tasks.It means the network (i) generates contents that conflict with the input data, e.g.producing wrong relations, entities, or sentiments, or (ii) generates extra items beyond the input, which means the output is not a precise description to the input, e.g."Hancock (2008)" in Fig. 4. Error Type III: Incoherent Logic, which refers to the problem of incoherent or contradictory logic in the generated dialogue, e.g. the user says (s)he has not seen a movie but liked it.

Conclusion
This paper proposes an automatic generation understudy for conversational recommendation datasets.By casting the dialogue synthesis process as a Data2Text generation task, a baseline framework is constructed to exploit (i) rich accurate user preferences from user-item matrices, (ii) rich external knowledge from external knowledge graphs, and (iii) the conversation ability from the corpus of existing CR datasets.Experiment results show that our generation is comparable to human-labeled conversations and superior in scalability, extensibility, and explainability.More importantly, we empirically show the benefit of our synthesized data in improving a CRS, especially in recommendation accuracy.The proposed approach exhibits great potential for automatic dataset synthesis and is expected to inspire researchers in other fields.

Limitations
The limitations of this work mainly lie in two aspects: (i) The synthesis quality is determined by the performance of existing Data2Text approaches, while Data2Text generation is still a difficult task that waiting for deeper exploration.The common errors in generation are included in Sec. 4.2.3. (ii) We adopt a PLM as the decoder in Data2Text generation in order to generate fluent utterances.However, as stated in (Ribeiro et al., 2021), PLMs tend to pay more attention to sentence fluency than to the graph structures of inputs, which may cause the loss of some critical information.B2.Did you discuss the license or terms for use and / or distribution of any artifacts?Not applicable.Left blank.
B3. Did you discuss if your use of existing artifact(s) was consistent with their intended use, provided that it was specified?For the artifacts you create, do you specify intended use and whether that is compatible with the original access conditions (in particular, derivatives of data accessed for research purposes should not be used outside of research contexts)?Not applicable.Left blank.
B4. Did you discuss the steps taken to check whether the data that was collected / used contains any information that names or uniquely identifies individual people or offensive content, and the steps taken to protect / anonymize it?Not applicable.Left blank.
B5. Did you provide documentation of the artifacts, e.g., coverage of domains, languages, and linguistic phenomena, demographic groups represented, etc.?Not applicable.Left blank.
B6. Did you report relevant statistics like the number of examples, details of train / test / dev splits, etc. for the data that you used / created?Even for commonly-used benchmark datasets, include the number of examples in train / validation / test splits, as these provide necessary context for a reader to understand experimental results.For example, small differences in accuracy on large test sets may be significant, while on small test sets they may not be.Not applicable.Left blank.

C Did you run computational experiments?
Left blank.
C1. Did you report the number of parameters in the models used, the total computational budget (e.g., GPU hours), and computing infrastructure used?Left blank.

Figure 2 :
Figure 2: The overview of the proposed AUGUST framework for automatic recommendational dialogue synthesis.

4. 1 Figure 3 :
Figure 3: The illustration of the used encoder-decoder architecture for Data2Text generation.

Figure 4 :
Figure 4: Visualization of a generation case by AU-GUST for error analysis.
you describe the limitations of your work?Left blank.A2.Did you discuss any potential risks of your work?Left blank.A3.Do the abstract and introduction summarize the paper's main claims?Left blank.A4.Have you used AI writing assistants when working on this paper?Left blank.B Did you use or create scientific artifacts?Left blank.B1.Did you cite the creators of artifacts you used?Not applicable.Left blank.
Hayati et al. manually annotate each utterance with the sociable strategies to validate the effectiveness of sociable recommendation strategies in CRS.Moon et al. present a parallel dialog↔KG corpus where each mention of an entity is manually linked with its corresponding KG paths.Liu et al. create a multi-type dialogue dataset and want the bots can proactively and naturally lead a conversation from a non-recommendation dialogue to a recommendation dialog.Similarly, Zhou et al. proposes a topic-guided CR dataset to help the research of topic transitions.However,

Table 1 :
Performance of Data2Text generation on three datasets.B-n denotes BLEU-n and R-L denotes ROUGE-L.
Among the annotated 200 dialogues, 100 are randomly sampled and used for training in the low-resource scenario, and the other 100 are set as the test set.Note that when testing on WebNLG in

Table 3 :
To validate Performance on ML-G2D test set when incorporating different types of training data, including ReDial training data, AUGUST synthesized data, and ML-G2D training set.

Table 4 :
Recommendation accuracy on the ReDial test set when trained on the ReDial and AUGUST data.