Knowledge-Grounded Dialogue Generation with Term-level De-noising

Dialogue generation has been improved through injecting knowledge into generative models. However, addition of knowledge through simple selection of sentences or para-graphs is likely to introduce noise and diminish the effectiveness of the generative models. In this paper, we present a novel K nowledge T erm W eighting M odel (KTWM) that incorporates term-level de-noising of the selected knowledge. KTWM includes a module for generating Simulated Response Vectors (SRVs) and uses SRVs attention distributions with the knowledge embeddings to determine knowledge term weights. Our experiments demonstrate that KTWM, combined with various knowledge selection algorithms, consistently achieves statistically signiﬁcant improvements over methods without term weighting when applied to two publicly available datasets Wizard of Wikipedia (Wiz) and Holl-E. The results are particularly improved for the Wiz test data with unseen topics, demonstrating the robustness of the KTWM noise-reduction approach.


Introduction
Research in dialogue generation has rapidly evolved from sequence-to-sequence (Sutskever et al., 2014) and Transformer models (Vaswani et al., 2017) to approaches with pre-trained models such as BERT (Devlin et al., 2019), XLNet  and T5 (Raffel et al., 2020). More recently, it included techniques that use knowledge, in addition to the original posts, to improve the quality of the generated responses (Ghazvininejad et al. (2018), Moghe et al. (2018), Dinan et al. (2019), Galley et al. (2019), Lian et al. (2019), Zheng and Zhou (2019), Zhao et al. (2020a), Zhao et al. (2020b)). 1 This approach is referred to as Post: I am a big fan of education. I think people don't realise how important it is. Ground-truth response: Sure, education is important since it facilitates learning and the acquisition of skills. Knowledge terms weighted by KTWM: Education is the process of facilitating learning , or the acquisition of knowledge , skills , values , beliefs , and habits Response generated by KTWM: I agree. Education is a great way to learn about facilitating learning.  knowledge-grounded dialogue generation and is the primary concern of this paper.
In particular, we consider the key issue of effectively incorporating the selected knowledge into the generation process. For example, Weston et al. (2018) apply a retrieve and refine method to expand the post with the retrieved knowledge and then use it in the generation process. Lian et al. (2019) consider the post and response posterior distributions and the post prior distribution to train jointly the model for knowledge selection and response generation. Kim et al. (2020) view the knowledge selection as a sequential decision problem, first selecting the best ranked knowledge using a sequential latent variable model, and then generating a response based on selected knowledge.
To the best of our knowledge, all prior approaches focus on the selection and injection of knowledge at the sentence or paragraph level. However, that makes it hard to control for potential for response they used 'answer', 'response', and 'target'. In this paper we call the first role the 'post' and the second role the 'response' and we aim to generate the response for the given post. noise, i.e., for inclusion of non-relevant words, and previous studies (Galley et al. (2019), Zheng et al. (2020)) have shown that adding noise can decrease the response generation quality. Therefore, it is important to investigate whether and how we can adjust the contributions of terms in the selected knowledge. Prior research has not considered that issue systematically.
Our paper fills this gap by introducing a novel Knowledge Term Weighting Model (KTWM) for dialogue generation, which effectively estimates term weights of the injected knowledge and incorporates such weights into the response generation. The response generation thus benefits from such nuanced term-level knowledge weighting, promoting important knowledge terms rather than treating equally all the terms in the selected sentences. In Table 1 we show an example of the KTWM term weighting and its generated response: the terms 'education', 'is', 'facilitating' and 'learning' are given higher weights correctly as they do appear in the ground-truth response, while the words 'values' and 'beliefs' are correctly assigned lower scores.
We conducted an extensive range of experiments with KTWM on two publicly available datasets: Wiz (with seen and unseen test topics) (Dinan et al., 2019) and Holl-E (Moghe et al., 2018). KTWM performs consistently well with different selections of knowledge, specifically with Post-KS (Lian et al., 2019), SKT (Sequential Latent-Knowledge Selection) (Kim et al., 2020) and TED (Transformer with Expanded Decoder) (Zheng and Zhou, 2019). Our work achieves both a superior performance in knowledge-grounded dialogue generation and new insights into the impact of the knowledge term weighting on that performance. The code of our method is publicly available at https://github.com/tonywenuon/acl2021 ktwm and enables reproducibility of our results.

Related Work
The knowledge-grounded dialogue generation can be tackled by decomposing it into two subproblems: (1) selecting knowledge from a large pool of candidates (knowledge selection), and (2) generating a response from the selected knowledge and context (knowledge-grounded response generation).

Knowledge-grounded Response Generation
Ever since the knowledge-based dialogue generation task was released by DSTC-7 (Galley et al., 2019), research interest in the topic has been steadily growing. Ghazvininejad et al. (2018) proposed a multi-task learning approach to produce responses. The posts and knowledge are used in the encoders and share the same decoder parameters. Luan et al. (2017) expanded the scope and introduced personality information into the model. They assumed that the trainable parameters can potentially capture persona from the non-conversational data (Tweets). Yavuz et al. (2019) adopted pointer-generator networks within a hierarchical framework that enabled them to include external knowledge in addition to the context. Ye et al. (2020) proposed a latent variable based generative model, which contains a joint attention mechanism conditioned on both context and external knowledge. Li et al. (2019) applied a deliberation network to create a two-stage generative model that combines both context and knowledge and, in the second generation stage, makes use of the outputs from the first stage. Zheng and Zhou (2019) proposed Transformer with Expanded Decoder (TED) architecture that assigns different weights to different knowledge sources and incorporates them into the generation process.
While the above approaches and models focus on incorporating knowledge and context to generate responses, they do that at the sentence or paragraph level. Our work deals with the quality of the incorporated knowledge at the term level, weighing all the individual knowledge terms when generating the responses.
Knowledge Selection Considering the mechanisms for response generation, Weston et al. (2018) proposed to retrieve candidate content from a knowledge set and use it to expand the post. The result is a truncated sequence that represents a refined post. (Lian et al., 2019) select the knowledge by approximating the prior distribution (i.e., p(knowledge|post)) with the posteriordistribution (i.e., p(knowledge|post, response)) and then inject it into the decoder. Kim et al. (2020) trained a knowledge selection module and a response generation module jointly, but treated the knowledge selection as a sequential decision problem, using input and knowledge from the previous turns to select the knowledge in the subsequent turns. Zheng et al. (2020) separated the knowledge selection process from the generation process so that all the downstream generation tasks can use the selected knowledge. They mapped posts to the best knowledge representations in both the training and the testing phase, and used the learned models to rank new post-knowledge pairs.
In this context, KTWM can be viewed as an optimization step following the knowledge selection. It is focused on learning knowledge term weights to distinguish between relevant and non-relevant terms and weighing higher those that are useful for the response generation.

Method
In this section we introduce the basic concepts and describe in detail our method KTWM for term-weighting of the injected knowledge. We assume that for a collection of posts P and responses R, we have a collection {K pr } of knowledge sets with sentences relevant to the specific post-response pair (p, r). For a given pair (p, r) we consider a knowledge injection process that involves three stages: (1) knowledge selection, (2) knowledge term-weighting, and (3) decoding with the weighted knowledge terms. Our primary focus is on (2), i.e., the effectiveness of the termweighting for the knowledge incorporated in the KTWM. Thus we provide a detailed description of the term-weighting model ( Figure 1) and the use of the KTWM decoder ( Figure 2).

Knowledge Selection and Representation
We represent each post p, response r, and a knowledge sentence k as a vector of terms. The set K pr typically contains multiple knowledge sentences and we use BM25 retrieval method to rank the sentences by their relevance to the post (in the test phase) or response (in the training phase). For knowledge injection we take the top ranked sentence. When the knowledge injection requires a specific number of terms to be used, we include additional sentences from the ranked list to meet that requirement (used in §4.4.3).
When a knowledge sentence k is retrieved based on a response r as a query, we define a ground truth vector GT know for the knowledge k with the weight of 1 assigned to the knowledge terms that are present in r and the weight of 0 assigned to those that are not, i.e., GT know = (e 1 , e 2 , . . . , e l ), where e i ∈ {0, 1}, i = 1, . . . , l. Encoders. We adopt Transformer (Vaswani et al., 2017) as the backbone framework for the training and testing of KTWM. Transformer encoder con-sists of a self-attention layer and a transition layer involving the layer normalisation and residual network. Formally, the attention is defined as (1) where Q, K, and V are embedding matrices and d m is the embedding dimension of the model. First we compute the dot similarity of the Q and K and then apply the weighted summation with V. The representation of Q is updated with the information from K and V. If Q, K, V originate from the same source, e.g., an input post, the attention is referred to as self-attention. Otherwise, if they originate from different sources, e.g., Q relates to the decoding token and K and V are from a post, the attention turns to be a mutual-attention operation. Figure 1 shows transformer encoders (encoders for short) used for the post, knowledge, and response representations and processing. We use w to designate an original term and w to designate the term's representation. In Figure 1, n, m, and l are three pre-defined hyper-parameters which refer to the length of the post (p), response (r), and knowledge (k), respectively (e.g. w pi means the i-th term of the post). Any sequence that is longer or shorter than the given length will be truncated or padded to the given length. By applying the encoder we obtain the post terms representations V post , comprising w p1 , w p2 , . . . , w pn (in Figure 1), from the original terms w p1 , w p2 , . . . , w pn . Similarly to V post in Eq.
(2), we obtain V know and V resp as term representations of the corresponding knowledge and the response, respectively.

Knowledge Term Weighting
The fundamental premise of our approach is that knowledge terms related to or present in the response should be more effective in improving dialogue generation. Thus, it can be beneficial to use methods such as attention distribution of response and knowledge embeddings to determine the weights of individual knowledge terms. However, in the real setting and during the test phase, we can only use terms and knowledge related to the post. Furthermore, the post embeddings can significantly differ from the response ones. Thus, assigning weights to the knowledge terms based on their similarity to post embeddings is unlikely to be sufficient (Xing et al., 2018).
For that reason, we aim to learn how to transform the post embeddings to be effective in knowledge term weighting. We achieve that by training a Post Embeddings Adapter that can, for a new post, generate Simulated Response Vectors (SRVs) and use them in place of the response vectors to score post related knowledge terms.
To that effect, we introduce a set of Multi-Layer Perceptrons (MLPs): where W i and b are trainable parameters for each term p i of the post p; w sj is the representation of the j-th term of the simulated response vector (SRV). The number of MLPs is the same as the number of terms in a given response.
During the training phase, MLPs learn the transformation of the post embeddings into SRVs that captures the ground truth response representation for a given post p. SRVs are then used to assign appropriate weights to the knowledge terms when response information is not available. SRVs Approximation and Training. The training phase begins with V post , V know , V resp and randomly initiated parameters of MLPs to produce the initial set of V SRVs for a given post. Each iteration then involves comparison of (a) the response embeddings V resp and knowledge embeddings V know , and (b) SRVs with the knowledge embeddings V know . More precisely, we compute the term-wise attention distributions A rk and A sk : where A rk ∈ R m×l and A sk ∈ R m×l ; m and l are hyper-parameters that are the maximum length of the response and knowledge sentence.
A rk reflects the relationship between the response terms and the knowledge terms: for each response term, A rk includes attention scores with all knowledge terms. Similarly, A sk includes attention scores between SRVs and the knowledge representations. The knowledge terms with larger response-knowledge attention scores are expected to produce output closer to the true response. In the training phase, that is guided by the filtering loss for A rk : where GT know is the knowledge ground truth vector which indicates whether the knowledge terms appear in the corresponding response or not and BCE is the Binary Cross Entropy loss function. Mean(·) computes the mean values for knowledge terms (in the matrix columns) across response terms (Mean(·) ∈ R l ).
At the same time we aim to train MLPs to create SRVs similar to the response representations V resp . In each iteration we compute and compare A sk to A rk and apply the approximation loss function: where MSE(·) is the Mean Squared Error function. Mean(·) of A rk and A sk produces l-length knowledge term vectors whose values are used to characterise the importance of each knowledge term. We use these weights to update the knowledge vector: where denotes element-wise multiplication and A k corresponds to A rk in the training phase and to A sk in the test phase. V post and the weighted knowledge vector V know become input for the KTWM decoder.

KTWM Decoder
In order to incorporate multiple sources of input, we adopt a decoder design that is similar to the TED model by Zheng and Zhou (2019). Figure 2 shows the architecture of our KTWM decoder. The blue frames are the standard Transformer decoder set-up with a self-attention layer and a mutual-attention layer (for the post), followed by a feed-forward layer. KTWM includes an additional knowledgemutual-attention layer which applies the same process to the knowledge, i.e., replicates the postmutual-attention layer for the knowledge. However, while TED focuses on assigning different weights to different sources, KTWM is already provided with scored knowledge terms. We use V PMA to denote the post-mutual attention, V KMA for knowledge-mutual attention and V dec for the decoding tokens representation matrix. With the attention defined by Eq. (1), we can express: The final mutual attention V MA in the decoder is then calculated from V PMA and V KMA : where ⊕ means element-wise summation. The feed forward layer is a standard Transformer transition layer (Vaswani et al. (2017)). Finally, we adopt Negative Log Likelihood (NLL) to train the model: logP (y t |y <t , p, k).
Given a post (p), knowledge (k), and the previously predicted terms (y <t ), L NLL maximises the probability of the currently predicted term. During the training phase, P (y t |y <t , p, k) is replaced with P (r t |r <t , p, k), i.e., we use the ground truth response as the input instead of the model output from the previous steps (Goyal et al., 2016). We assume that all three loss functions are equally important and create the final loss function as a sum: KTWM thus provides a flexible learning framework, enabling injection of knowledge based on different selection criteria. We compare KTWM effectiveness when used with Post-KS, SKT and TED model by incorporating the knowledge that each of these methods selects.

Experiments
We conduct empirical evaluation of KTWM compared to state-of-the-art baselines.

Datasets
In our experiments we use two publicly available datasets: Wizard of Wikipedia (Dinan et al., 2019) and Holl-E (Moghe et al., 2018). Both are purposefully created by humans editors to support dialogue generation research.
Wizard of Wikipedia (Wiz). Dinan et al. (2019) employed Amazon Mechanical Turk (MTurk) workers to generate the datasets. The workers can assume two different roles: a wizard (a teacher) and an apprentice (a student). An apprentice asks a question according to a given topic and a wizard answers the question based on the provided questionrelated information (retrieved from Wikipedia). The response can quote the retrieved knowledge or can be generated entirely by the wizard without considering the knowledge. Thus, for each question-response pair there is related knowledge that can be used for knowledge-grounded dialogue generation research. The Wiz dataset consists of 22,311 dialogues with 201,999 dialogue turns divided into a training dataset and two test datasets referred to as seen test set and unseen test set. The seen test set includes topics that have already been seen in the training set. In the unseen dataset, there are topics that may not have been included in the training dataset.
Holl-E Moghe et al. (2018) also made use of MTurk workers to create an annotated dataset that focuses on movies as two workers talk with each other about a chosen movie. When answering another worker's question, one is provided with four sources: movie plots, reviews, comments, and fact tables related to the movies. These sources can be considered as background knowledge. The final response is produced by copying from the sources or by modifying the sources. The Holl-E dataset provides training set and test test and contains 9,071 conversations, covering 921 movies.

Metrics, Setup and Baselines
Metrics. For performance evaluation, we adopted standard lexical-based metrics: BLEU (Papineni et al., 2002), METEOR (Lavie and Agarwal, 2007) and embedding-based metric: BOW Embedding (Liu et al., 2016). BLEU 1-4 metrics measure cooccurrence of n-gram terms in two given sequences, e.g., the generated responses and the ground truth responses. METEOR is an adaptation of BLEU that considers the presence of synonyms and common word stems. BOW Embedding measures the similarity of two sentences from the semantic perspective. Specifically, it computes the average metric, greedy metric and extrema metric based on word embeddings of compared sentences. The average metric considers cosine distance between pairs of sentence-level representations (e.g., the predicted response and ground truth response) by averaging the representations of their constituent words and calculates the average across all pairs. The greedy metric considers the maximum cosine scores along rows and columns in the similarity matrix. The extrema metric of two sentences first creates a sentence vector with the highest wordembedding values (along the dimension) and then computes the similarity score. BLEU, METEOR 2 and BOW Embedding 3 are calculated using NLG evaluation sources. Experiment Setup. For the sake of comparison, we fixed a set of parameters across all the experiments. The number of dimensions in embeddings is set to 100. The vocabulary size is 30,000. The vocabulary is obtained by ranking terms by word frequency in the training set. The minimum sequence length is set to 8 and the maximum length is 30. We train using mini-batches of size 64. We use Adam optimiser (Kingma and Ba, 2015) for optimisation. The initial learning rate is set to 0.001 and halved when the loss score does not decrease for two epochs. In the training phase we use responseretrieved knowledge, i.e., the sentences retrieved by BM25 algorithm using responses as queries (see Figure 1). The top 1 ranked knowledge sentence is injected into KTWM. In the test phase, we retrieve knowledge using BM25 algorithm and posts as queries. All the experiments are conducted on a single TITAN V GPU. For Wiz dataset, an experiment requires about 6 hours to complete, while for Holl-E about 2.5 hours. Baselines. We compare KTWM with three strong baselines: Post-KS (Lian et al., 2019) uses an elaborate knowledge selection module and injects the selected knowledge into a generative model by approximating prior-distribution (i.e., p(k|p)) with posterior-distribution (i.e., p(k|p, r)).
SKT (Kim et al., 2020) considers knowledge selection as a sequential problem. It jointly trains a knowledge selection and a generative model by taking into account inputs and knowledge from previous turns.
TED (Zheng and Zhou, 2019) uses a knowledgegrounded generative model that assigns different weights to different sources when generating responses. It applies knowledge ranking using BM25, which is the same as in our setting.

Experiment Design
Our experiments focus on term weighting of the selected knowledge rather than the knowledge selection itself. Since the baseline models (Post-KS 4 , SKT 5 and TED 6 ) incorporate knowledge selections, we conduct a comparative evaluation of KTWM by incorporating knowledge specific to each baseline method. Furthermore, since all three baselines inject knowledge at the sentence level, by selecting the top ranked sentence, we do the same with KTWM.

Performance of Generating Response
We summarize KTWM experiments with the Wiz and the Holl-E datasets in Table 2. Results for the Wiz seen and unseen test sets are in Table 2, sections (a) and (b), respectively. Results for the Holl-E dataset are in Table 2, section (c). Since METEOR extends BLEU metrics by considering word stems and synonyms, we take it as the main metric for discussing the experiment results. We observe that: (1) For all of three datasets, KTWM outperforms each baseline method across all lexical and embeddings based metrics with a statistically significant difference.
(2) KTWM with Post-KS knowledge achieves the largest relative improvement considering the METEOR score: increase of 45.3%, 54.5% and 40.0% for the three test sets, respectively.
(3) For the Holl-E dataset, KTWM with TED knowledge outperforms other two baseline models. TED knowledge comprises top sentences retrieved using BM25 algorithm.
(4) On the Wiz datasets, KTWM achieves a remarkable performance in terms of BLEU-1 and METEOR scores. A consistent and strong performance in the Wiz unseen test data indicates the robustness and generalization of KTWM.

Results of Knowledge Term Weighting
The loss function (Eq. (7)) controls KTWM ability to distinguish between relevant and non-relevant knowledge terms, similar to a binary classifier. We set a threshold of 0.5 for a knowledge term's predicted score and consider the overlap between the predicted and the truth useful knowledge terms. This leads to precision/recall evaluation of the positive and the negative class prediction. Table 3 shows results from the Wiz seen test set. They are representative of the results for the other two datasets.
We observe that the precision of predicting useful terms is 50% and noisy terms is over 91% (with a high F-1 score, 94%). Thus KTWM term weighting is effective in detecting noisy terms while only half of the predicted useful terms overlap with the ground truth terms. Since noisy terms are assigned lower term weights, KTWM is effective improving the dialogue generation performance. Appendix A shows illustrations of the KTWM noise reduction.

Analysis of Input Sequence Length
We analyze the effects of knowledge de-noising by considering the useful terms proportion (UTP) as we increase the number of injected knowledge terms: UTP = N um of distinct usef ul terms N um of all injected terms . We use UTP K for UTP when the number of injected knowledge terms is K (e.g., UTP 30 for 30 knowledge terms). Our analysis shows that UTP 30 is 12.23% and UTP gradually decreases with additionally injected knowledge leading to UTP 300 of only 3.35%. Figure 3 shows a gradual decline of the KTWM performance with the increased length of injected knowledge, as the proportion of noisy   terms increases.
We also investigate the effects of the loss functions L f ilter and L approx on the KTWM performance by running experiments with and without them. In Table 4 we show the results on the Wiz seen test set using BM25 to select knowledge. We note that, after removing L f ilter loss function, BLEU-1 and Average scores decrease, while BLEU-4 and METEOR scores increase. Since L f ilter aims to ensure that relevant response terms are promoted, it is not surprising that the metrics focused on unigrams are most affected. However, this impact on KTWM is less notable than the removal of the L approx . Without L approx , the KTWM loses the ability to align simulated response vectors SRVs with the response embeddings to capture the attention distribution between the knowledge and the response embeddings that is needed to score knowledge terms. This increases the noise ratio and reduces the KTWM performance scores across all metrics.

Conclusions
Current knowledge-grounded dialogue models select and inject knowledge either through traditional (unsupervised) retrieval technique, such as BM25, or by incorporating knowledge selection within the dialogue generation model. Most of them incorporate knowledge as sentences or paragraphs. Past research provided evidence (Galley et al. (2019), Zheng et al. (2020)) that inserting useful terms can increase the response generation performance but it is necessary to control for negative effects of noisy terms.
In our work, we introduce a novel Knowledge Term Weighting Model (KTWM) that performs knowledge term-level weighting and de-noising of injected knowledge. We demonstrate that KTWM effectively estimates weights of knowledge terms and yields better response generation performance than state-of-the-art baseline models when evaluated on two broadly used datasets. Besides the superior response generation outcomes, our research provides important insights into the importance of the knowledge term weighting. As part of our future work we intend to (1) extend the KTWM models to incorporate multiple sources of evidence, such as balancing between selected knowledge and dialogue contexts (i.e., previous dialogue turns) and (2) take into account inter-dependencies among terms when weighting the selected knowledge.

Appendix A Examples of Knowledge Term Weights and KTWM Generated Responses
In Table 5 and 6 we present examples of post/response pairs and selected knowledge with terms weighted by KTWM. As explained in §4.4.2, we use a threshold of 0.5 on term scores to classify terms into useful and noisy ones and study the effect of this selection on the overall performance of KTWM. In the examples, we visually show the weights of each terms. Terms are highlighted in different shades of blue colour according to the weight (note the colour legend at the bottom of the tables). All the examples are extracted from the Wiz seen test set. They are sorted by the number of words that exceed the threshold.
In Table 5 we see that the key words are tagged with dark blue, indicating that KTWM has assigned high weights to them. From the KTWM generated responses, we can see that if the words appear in the post and ground-truth response simultaneously, the KTWM works effectively, i.e., can correctly incorporate injected knowledge into the generated response.
On the other hand, the negative examples in Table 6 show that the term scoring can be ineffective if there is no good overlap with the ground truth response. We observe in these examples that most of the words with relatively high scores do not exist in both post and response. At the same time, if the injected knowledge does not contain useful terms, the produced responses might be irrelevant. In Table 6, most of terms have light blue colour, indicating that KTWM detected a relatively low importance of these terms correctly.
The examples in these two tables also confirm statistical results shown and discussed in §4.4.3. KTWM term weights still induce noise, especially when the injected knowledge does not contain useful terms (i.e. terms that present in the ground truth response), resulting in a worse response generation performance.
We note that the both sets of examples include highlighted punctuation (e.g., ',') and language structural terms (e.g., 'the', 'is') which obtain high KTWM weights. We assume that such terms are widely distributed in post and response sets and therefore detected as important.
Post: I love avengers!! Have you watched the Infinity War movie? I think it was really good and I can't wait till the next movie to find out what happens. Response: The Avengers Infinity War was so good. What a cliff hanger. Blows my mind that the Avengers made their comic debut way back in 1963. Knowledge Terms Weighted by KTWM: In " Avengers : Infinity War " , the Avengers join forces with the guardians of the galaxy to confront Thanos , who is trying to collect the infinity stones Response generated by KTWM: I love the Avengers. the Avengers was the Avengers, Infinity War. He is such a great show. Post: I have never actually been to a horse race in Kentucky, where is the Derby held at anyways? Response: Louisville Kentucky, it ends the Kentucky Derby festival usually. Knowledge Terms Weighted by KTWM: The Kentucky Derby presented by Woodford reserve is a horse race that is held annually in Louisville , Kentucky , United States , on the first Saturday in may . Response generated by KTWM: I'm not sure but I know that it is located in Kentucky race in the United States and Canada. Post: New York city is quite impressive. Response: Yeah they say that New York and LA are the two largest markets in our country.

Knowledge Terms Weighted by KTWM:
The city of New York , often called New York City or simply New York , is the most populous city in the United States Response generated by KTWM: Yes, it is. The city of New York is the most populous city in the United States. Post: Hi there! I drive a Honda Civic. Do you have a car? Response: I have a VW golf, but was looking to change to a Honda Civic. I heard that the Civic has got larger throughout the years. Knowledge Terms Weighted by KTWM: The Honda Civic type R is the highest performance version of the Honda Civic made by Honda Motor company of Japan Response generated by KTWM: I have a Honda Civic. They are the second most commonly used in the US in Japan. Post: Well that's interesting, but seems like it could get confusing! I bet humans have been hiking ever since we could walk though! Response: Yeah hiking has been around for ever, the term bush walking is endemic to Australia.

Knowledge Terms Weighted by KTWM:
Hiking is the preferred term , in canada and the united states , for a long , vigorous walk , usually on trails ( footpaths ) , in the countryside Response generated by KTWM: Well, hiking is a long term hiking is used in the United States, and the countryside on trails.