Knowledge-augmented Self-training of A Question Rewriter for Conversational Knowledge Base Question Answering

,


Introduction
Knowledge Base Question Answering (KBQA) has been a recent surge of research interest, due to the appearance of large-scale knowledge bases (KBs), such as Wikidata (Vrandečić and Krötzsch, 2014), Freebase (Bollacker et al., 2008), as well as some domain KBs such as the Amazon product KB (Dong et al., 2020) and the academic KB (Tang et al., 2008).KBQA provides users an easier way to seek for factual knowledge in the natural language in spite of KBs' underlying structures.However, instead of the single-turn KBQA requirement, people tend to start from a topic and explore it with follow-up questions in a conversational manner.Especially, driven by the rise of practical conversational applications such as online customer service system and intelligent personal assistant, Conversational KBQA (ConvKBQA) has been attracting more attention.
ConvKBQA poses great challenges for existing QA systems, which target to process a single selfcontained question once a time.Because, in a conversational setting, the follow-up questions are usually incomplete with missing entities, which is referenced as ellipsis and coreference phenomenon.Figure 1 shows an example of ConvKBQA, where the initial question is usually full-fledged mentioning clear topic entities, from which the conversation often drifts away.In this example, it revolves around three subjects alternately: the book, its author and the cathedral.Question Q2 that omits the title of the book is an instance of entity ellipsis.Q3-Q5 have entity coreference phenomenon because "Fitzgerald" in Q3 and "Fitzgeralds" in Q4 refer to "F.Scott Fitzgerald" and the couple respectively.To facilitate such conversation, we can replace or supplement with the entities they refer to or leave out to make them self-contained.That is to say, the follow-up questions can be rewritten as shown on the right in Figure 1.
Existing works leverage the conversation history as well as the underlying KB to overcome the above ellipsis and coreference problem.Similar to the single-turn KBQA (Lan et al., 2021), the ConvKBQA methods can be categorized into the retrieval-based (Kaiser et al., 2021;Lan and Jiang, 2021) and semantic parsing (SP)-based ones (Kacupaj et al., 2021;Marion et al., 2021;Plepi et al., 2021).The former ones first identify the topic entities for each turn's question, and then expand a subgraph for answer reasoning.They usually maintain a gradually growing topic entity candidate set during the conversation, but take the risk of redundant and noisy candidates as well as error propagation caused by the identified candidates of previous turns.The latter ones parse each turn's question into an executable logic form, but suffer from the lack of the annotated logic forms and the incompleteness of the questions.They usually augment the input with conversation history and extra KB information, but the long textural input and its distribution difference with the logical output increase the difficulty of parsing.This paper proposes rewrite-and-reason, a different pipeline framework by first rewriting the incomplete question into a self-contained one and then adopting any single-turn KBQA model.Compared with the existing retrieval-based methods, the rewriter in the first step of the framework does not reply on entity linking, and the rewritten questions of previous turns do not affect the next turn, which reduces error propagation.Besides, the rewriter can maintain the original input format (i.e., the natural language question) to further enable SP-based single-turn KBQA models in the second step.Compared with the SP-based ConvKBQA models that directly parse an incomplete question, such decoupled rewrite-and-reason (rewrite-and-parse if the SP-based model is used for reasoning) can reduce the parsing difficulty.
Since the annotations of the rewritten questions are unavailable in the target ConvKBQA dataset, we take advantage of existing annotations in other conversational QA (ConvQA) datasets as the in-herent regular patterns of ellipsis and coreference are similar in conversation.However, the conversation styles are quite different that ConvQA is more chatty and verbose while ConvKBQA is more knowledge-centric and concise.Even worse, when facing the entities and relations not involved before, the rewriter might be at a loss.So, we further fine-tune it by knowledge-augmented self-training, where the augmented knowledge exposes the current underlying KB to the rewriter while the selftraining process tries to adapt the rewriter to the current conversation style.The idea can also be applied to using other annotated datasets for training the subsequent SP-based KBQA models.
We conduct extensive experiments on ConvQuestions (Christmann et al., 2019), a ConvKBQA dataset grounded in Wikidata.The results reveal three major advantages: (1) The question rewriter, equipped with a retrieval-based reasoner NSM (He et al., 2021) or a SP-based reasoner KoPL (Cao et al., 2022) both achieves significant gains (+11.6% and +3.1% Hits@1 respectively) and creates new state-of-the-art results.(2) Compared with existing ConvKBQA systems, the rewritten questions can help identify the topic entities which results in a higher answer coverage rate (+12.9%).(3) Ablation studies demonstrate the knowledge-augmented self-training can indeed make the pre-trained rewriter/KoPL adapt to the concerned ConvKBQA task.

Contributions . (1)
We propose a question rewriter decoupled from the subsequent KBQA model to enable a plug-and-play framework for ConvKBQA, which can adopt both the retrieval-and the SPbased single-turn KBQA models.(2) We devise a knowledge-augmented self-training strategy for adapting the pre-trained question rewriter to the concerned conversation style and the underlying KBs.(3) Both NSM and KoPL equipped with the rewriter achieve new SOTA performance on the well-adopted benchmark ConvQuestions.

Related Work
Single-turn KBQA falls into two mainstream methods, retrieval-based (Feng et al., 2021;He et al., 2021;Qiu et al., 2020;Sun et al., 2019) and SP-based (Cao et al., 2022;Gu et al., 2021;Ye et al., 2022).The former ones usually encode the entities and relations in KBs as well as the questions into a unified embedding space, based on which the answers are inferred.Recently, these  methods often restrict the embedding space within a question-relevant subgraph expanding from the topic entities.Instead of representing KBs and questions, the latter ones often parse the questions into logical forms, which can be executed over the KB to get the answers.The type of intermediate logical forms can be various, such as the SPARQL used in (Das et al., 2021), the skeleton grammar proposed in (Sun et al., 2020), and the KoPL programming language designed in (Cao et al., 2022).
Conversational KBQA follows traditional singleturn KBQA systems and can also be categorized into retrieval-and SP-based methods.Different from the single-turn KBQA, the crux here is to deal with the ellipsis and coreference problem.The former ones usually identify the topic entities of each turn's question and then retrieve the subgraph for answer reasoning.The major differences lie in how to identify the topic entities.For example, Lan and Jiang (2021) builds an entity transition graph and applies a graph neural network to derive the topic entity distribution, while Kaiser et al. (2021) defines four heuristic measures to estimate such distribution.The SP-based methods (Kacupaj et al., 2021;Marion et al., 2021;Plepi et al., 2021) also parse from each turn's question to the logic form similar to the single-turn KBQA, but the difficulty is the incompleteness of the questions to be parsed.So they augment the model input with conversation history as well as extra KB information for parsing out the more correct logic form.But meanwhile, it inevitably introduces additional noises which might hamper the parsing performance.
Conversational QA performs conversational QA over unstructured text data instead of the structured KB.Thus different from ConvKBQA, Con-vQA usually performs question rewriting, document retrieval, and reading comprehension on text data (Anantha et al., 2021;Elgohary et al., 2019).
Besides, the conversations in ConvQA are more chatty than those in ConvKBQA.Such difference also hinders us from directly using the annotations in ConvQA to train ConvKBQA models.

Problem Formulation
A KB K is composed of a great number of (h, r, t) triplets, where h, r, and t represent a head entity, a relation, and a tail entity respectively.KBQA is to seek answers to a natural question from the given KB.ConvKBQA extends the single-turn QA into the multiple-turn QA conversations.To learn such a ConvKBQA system, we collect a dataset {C 1 , C 2 , ..., C N } including N conversations, where each conversation C i starts with a seed entity s i and lasts for K turns, denoted as being the t-th turn conversation.ConvKBQA is to seek answers to each q i t from a given K based on the conversation history Specially, for the initial question q i 1 , the history

The Rewrite-and-Reason Framework
In this section, we first introduce the proposed rewrite-and reason framework, and then elaborate on each part of our model, i.e., the pre-training and fine-tuning of the question rewriter, as well as the downstream reasoners used in our work.

Overview
To address the challenges of ellipsis and coreference in the conversational setting, we propose a rewrite-and-reason framework that first rewrites the incomplete questions into the self-contained ones, and then adopts a single-turn KBQA model to solve the rewritten questions.The general workflow of the framework is illustrated in Figure 2.For the rewriter, we propose a pre-training plus fine-tuning paradigm due to the absence of supervision signals in ConvKBQA.Specifically, the rewriter is first pre-trained on the open domain Con-vQA dataset CANARD (Elgohary et al., 2019) with gold annotations and then fine-tuned on the Con-vKBQA dataset ConvQuestions (Christmann et al., 2019) with knowledge-augmented self-training.
For the reasoner, we explore both the retrievaland SP-based models.Specifically, based on the rewritten questions, we retrieve the subgraphs to enable the reasoning of the former ones, and adopt a similar pre-training and fine-tuning paradigm for learning the latter ones.

Question Rewriter
We explain how to (1) pre-train the question rewriter and (2) fine-tune it by knowledgeaugmented self-training.

Pre-training
As all the conversation history is posed in the natural language, it's applicable to leverage the pretrained sequence-to-sequence (Seq2Seq) language models (PLMs) such as BART (Lewis et al., 2020) and T5 (Raffel et al., 2020) for question rewriting.A simple and effective way is to concatenate the historical question-answer pairs H i t as well as the current question q i t as the input [H i t ; q i t ] of the PLMs and output the rewritten question qi t , which is supposed to be a standalone question.Then, with the supervision of the gold rewritten question qi t , we continue to pre-train the PLMs by maximizing the probabilities of generating all the K-turn gold rewritten questions, i.e., Because of the absence of gold annotations for rewritten questions in ConvKBQA, we can only pre-train the question rewriter on an existing Con-vQA dataset with the annotated rewritten questions.

Knowledge-augmented Self-training
The regular patterns of ellipsis and coreference across different conversations are captured by the above pre-trained question rewriter.However, there is an inevitable gap between the distributions of more chatty open-domain conversations and more factual KBQA conversations.Thus, we leverage self-training, a well-adopted domain adaptation method, to overcome the lack of annotations on the target domain (Wei et al., 2021;Mukherjee and Awadallah, 2020;Xie et al., 2020;Zou et al., 2019).
Self-training is to train the model on the labeled data, and then apply it on the unlabeled data to generate pseudo labels.It is crucial to select confident pseudo labels on the target domain for adapting the pre-trained model to the target domain.
Different from the traditional self-training, we change the input [H i t ; q i t ] of the original pre-trained model into the knowledge-injected one [H i t ; q i t ].Below we explain why and how we inject the knowledge and select the pseudo labels for self-training.Knowledge Injection.Injecting the knowledge into the original conversation can help the rewriter distinguish the right entities from the ambiguous ones.For example, in Figure 3, "the Fitzgeralds" in Q4 is successfully rewritten into "F.Scott Fitzgerald and Zelda Fitzgerald" rather than only one of them.Because when coming across Q4, both the entities "F.Scott Fitzgerald" and "Zelda Fitzgerald" are augmented with the relation "spouse", which helps the rewriter correctly recognize the two topic entities in the question.
To enable the knowledge-augmented selftraining, we train a relation retriever to identify the most relevant relations to the current turn's question for knowledge injection.The relation retriever is instantiated by BERT (Kenton and Toutanova, 2019), which accepts the concatenation of a question q m and a relation r j as the input and takes the [CLS] token as the output embedding to compute the relevance score between q m and r j .
The ConvKBQA dataset only contains the (question, answer) pairs without the relations that can derive the answer, so we construct a pseudo dataset for training the above relation retriever.Since the seed entity of each conversation is given, and the follow-up questions are usually around the entities mentioned before, we can start from the already appearing entities to check whether they can arrive at the answers or not following their one-hop relations.The one-hop relation that can derive the answer and the current question compose a pseudo (question, relation) label.Specifically, we automatically construct the pseudo (question, relation) labels from each conversation as follows.A topic entity set is initialized with the seed entity given in the first question.At each turn of the conversation, we enumerate each entity in the set and expand its one-hop relations, including both the outgoing and incoming relations.If a relation can derive the gold answer to this turn's question, we make the relation and the question as a pseudo label.Then we augment the topic entity set with non-string answers of the last turn.We repeat the above operations until the final turn of the conversation.Figure 4 illustrates the construction process of the example in Figure 1.As a result, we construct a pseudo dataset consisting of about 32,000 (question, relation) pairs.With the relation retriever, we can retrieve the most relevant relation to the current turn's question for the seed entity and each answer entity in the conversation history H i t .Then we supplement these relations to the corresponding entities in H i t .In addition to injecting useful relation information, we also pad the tail entities expecting to provide the rewriter with more choices for the topic entities.For example, given a conversation with the seed entity as "Harry Potter", Q1 as "What is the first book of Harry Potter?" and Q2 as "Where is the author born?", if we can inject "J.K. Rowling" as the tail entity of the relation "author" for "Harry Potter", it gives the rewriter a chance to directly rewrite Q2 as "where is J. K. Rowling born?" rather than "where is the author of Harry Potter born?", which is a more simple one-hop question to be answered.
Pseudo Label Selection.The pseudo labels are usually quite noisy which might take a negative effect on model fine-tuning.Thus, we need to carefully devise the pseudo label selection strategy.
We select the pseudo rewritten questions according to the relationship with the corresponding answers.Specifically, for the question q i t , we first generate the rewritten question qi t by the pre-trained rewriter, and then sift out those containing topic entities whose one-hop subgraphs cover the correct answers, because we conjecture that a rewritten question is more likely to be right if it can derive the answer by one-hop reasoning.
For identifying the topic entities, we use ELQ (Li et al., 2020), an entity linking tool for questions, to obtain the topic entity candidate set.Since it may contain some topic-irrelevant entities, we only keep those appearing in the conversation history or in the one-hop neighbors of the entities in the conversation history.If none of such entities can be recognized, we use the seed entity of the whole conversation as the topic entity of the current turn's question.Algorithm 1 in the attachment shows the details of topic entity identification.
The objective function of knowledge-augmented self-training is: where qi t is the selected pseudo label of the original question q i t .Since we select partial labels on the whole dataset, the resultant conversation size N ′ and the turn number K ′ i are less than the original sizes N and K.

Reasoner
With the advantage of the decoupled framework, the rewritten questions can flexibly adapt to different downstream reasoners.We explore both retrieval-and SP-based reasoners.For the former ones, an additional procedure of topic entity identification from the rewritten questions is required for the subsequent subgraph retrieval and reasoning.While for the latter ones, we need to overcome the lack of logic form labels.

Retrieval-based Reasoner
Retrieval-based methods usually represent the entities and questions as embeddings, based on which the relevance is calculated to rank the candidate answers.To improve the accuracy and efficiency, the answer candidates are usually restricted within the subgraph expanded from the topic entities.
We employ the state-of-the-art NSM (He et al., 2021) as the retrieval-based reasoner.For each turn's rewritten question qi t , we also use Algorithm 1 to find the topic entities T i t .Then we retrieve a τ -hop subgraph starting from each topic entity and merge all of them to a unified subgraph S i t .We use the answers A i t as supervision signals to train NSM.The objective function is defined as: (3)

Semantic Parsing-based Reasoner
The SP-based methods usually parse the questions into logic forms, which can be directly executed on the KB to retrieve the answer, but they depend on the annotated logic forms.Here, we explore a similar pre-training and fine-tuning paradigm with knowledge-augmented self-training as the question rewriter to train such SP models.We first pre-train a SP model on the KQA pro dataset (Cao et al., 2022) consisting of (question, KoPL) labels, where KoPL is of logic form.The pre-trained SP model can translate a question into its KoPL program.We apply it to generate the KoPL program pi t for each rewritten question qi t on the ConvKBQA dataset.Then we sift out the (q i t , pi t ) pairs in which qi t is filtered following the rewriter's pseudo label selection strategy and pi t is restricted to the program that contains the identified topic entity as well as the corresponding relation and can be executed to obtain the correct answers.We also use Algorithm 1 to identify topic entities.
To inject the knowledge when self-training, we supplement qi t with relevant triplets of the topic entities as Qi t , where the relation in the triplet is determined by the relation retriever.The objective function of the SP model self-training is: where N and Ki represent the conversation size and turn number in the i-th conversation of the selected pseudo dataset.To further improve the accuracy, we also modify the inferred KoPL programs with the identified topic entities and corresponding relevant relations by Algorithm 2. We present more details about the training and inference process of KoPL in Appendix A.5.

Experimental Settings
Dataset.
We evaluate our method on Con-vQuestions (Christmann et al., 2019) 2 , a Con-vKBQA dataset created on Wikidata by crowdworkers on Amazon Mechanical Turk.ConvQuestions contains about 11,000 conversations from five domains:"Movies", "TV Series", "Music", "Books" and "Soccer", which are partitioned into 7,000/2,000/2,000 for training/validating/testing.Each conversation goes in a 5-turn dialog, only with the annotated ground truth answers.We do not evaluate on another ConvKBQA dataset CSQA (Marion et al., 2021) 3 , because CSQA is less challenging in the conversational setting.The turns in CSQA are simply linked together into a conversation if adjacent questions have overlapped entities or relations, which indicates that the topic entities of a question can be found in either current or last turn.Thus the challenge of CSQA mostly lies in the downstream reasoning against KB, rather than the ellipsis and coreference in the conversational setting.
Evaluation Metrics.To evaluate the question rewriter, since we do not have the ground truth, we use the intermediate answer coverage rate (ACR), i.e., the percentage of rewritten questions from which we can extract topic entities that can reach the correct answers in their one-hop subgraphs.
For the QA performance, we use the top-1 hit ratio (H1) to evaluate whether the top-1 predicted entity is the correct answer, and also report the F1 score by viewing questions as multi-answer ones.
When evaluating any question, the answers to the last turns are predicted by models instead of the ground truth answers given in the dataset.
Baselines.We compare with CONVEX (Christmann et al., 2019), OAT (Marion et al., 2021), CONQUER (Kaiser et al., 2021), and Focal Entity (Lan and Jiang, 2021) as baselines.OAT is a SP-based method which incorporates conversation history and extra KB information besides the current question as the input to decode their defined logic forms.The other three are retrieval-based methods.CONQUER maintains a topic entity candidate set during the conversation for identifying the topic entities, which are scored by heuristic measures.While CONVEX and Focal Entity further build a transition graph on the candidate set, where CONVEX defines some heuristic rules and Focal Entity adopts the graph neural networks to 3 https://amritasaha1812.github.io/CSQAidentify topic entities.They all infer the answer based on the relevance between the current question and the paths derived from the topic entities.

Overall QA Evaluation
We compare our proposed rewrite-and-reason framework with existing state-of-the-art ConvK-BQA methods, including both end-to-end(SPbased) and pipeline(retrieval-based) frameworks.The results are shown in Table 1.Compared with the baselines, we can observe an obvious performance improvement, e.g., 11.6% H1 improved by the question rewriter (QR) combined with NSM, and 3.1% with KoPL.The results show that the question rewriter with knowledge-augmented selftraining outperforms both end-to-end and pipeline methods and can improve the performance not only equipped with the retrieval-based methods but also the SP-based methods.
Moreover, to intuitively demonstrate the effectiveness of our question rewriter, we also report the intermediate answer coverage rate in the first column of Table 1, which is also the quantitative analysis of the noise introduced in the pipeline process.Since our rewriter outputs the rewritten questions instead of the topic entities, we leverage Algorithm 1 to identify the topic entities, from which we retrieve their one-hop related entities to compute the answer coverage rate.In terms of ACR, our proposed pipeline method rewriter-reasoner outperforms the SOTA pipeline method Focal Entity by 12.9%, which demonstrates the proposed question rewriter could result in more accurate topic entities and the noises are significantly reduced by our method in the pipeline process.

Ablation Experiments
To investigate the effectiveness of different parts in our framework, we respectively remove the injected knowledge in the self-training input as well as the whole self-training step in the question rewriter one at a time to show the effects of them.

Effect of Knowledge Injection
We explore whether the knowledge incorporated in self-training, retrieved by the relation retriever, can improve the performance or not through three aspects: the direct retrieval performance on the pseudo (question, relation) dataset, the rewriter's answer coverage rate, and the final QA performance.
First, the retrieval results on the pseudo (question, relation) dataset in terms of H1, H3, and H5 are 83.0%,91.5%, and 93.6% respectively.To further prove its validity, we also use it as the reasoner combined with the question rewriter.The overall H1 is 32.3% as shown in Table 1.Although it is much simpler than NSM (He et al., 2021), the QA performance can be comparable to the baseline Focal Entity which also uses a more complex reasoner.All the results show that the relation retriever can retrieve relatively accurate relations.
Second, we use the answer coverage rate to evaluate the quality of the rewriter.Here, for a fair comparison, we use gold answers instead of the predicted answers in the last turns to get rid of the impact of reasoning capability.Moreover, we strictly restrict the rewritten questions to containing the complete topic entity names to exclude the impact of the entity linking procedure.The answer coverage results with and without the injected knowledge (i.e., w/o k) are reported in Table 2.We report the answer coverage rates on all the dataset  3 also indicate that when removing the knowledge from the rewriter's self-training process, the QA performance of the subsequent reasoner including both NSM and KoPL drops.

Effect of Self-Training
To demonstrate the effectiveness of our proposed self-training mechanism, we remove the whole selftraining process and directly use the pre-trained rewriter for question rewriting (i.e., w/o st).The answer coverage rate and the QA performance are given in Table 2 and Table 3 respectively.We can see that without self-training, the answer coverage rate reduces 3.1% on All and 2.4% on Test, H1 reduces 4.4% equipped with NSM and 1.0% with KoPL.These results demonstrate the effectiveness of self-training for the rewriter.

Evaluation of SP-based Methods
When exploring KoPL as the reasoner, we adopt the similar knowledge-augmented self-training to transfer it from another dataset to the concerned dataset.Thus, we also conduct the ablation study to observe whether the knowledge injection and the self-training are useful for KoPL.As is shown in Table 4, both the injected knowledge and self-training promote the QA performance on ConvQuestions, which are consistent with their effects on the proposed question rewriter.Since we study the transfer capability of the KoPL reasoner here, we do not report the performance of the modified KoPL pro-grams as in Table 1 and Table 3, but only report the performance of the originally inferred programs by the KoPL model.

Discussions
Performance on Multiple Coreferences and Ellipsis.
How is the capability of the question rewriter when facing more complicated circumstances, such as multiple coreferences or multiple ellipses of entities or relations?
For multiple coreferences, since the gold rewritten questions are unavailable, we manually select and evaluate 100 samples with multiple coreferences from the ConvQuestions (Christmann et al., 2019) dataset.The accuracy is about 93%.We list three original questions and correct rewritten questions by our model as follows: (1) "Is she younger than him?"=> "Is Leslie Stefanson younger than James Spader?" (2) "Were the two living at the same period?"=> "Were Mikhail Bulgakov and Mikhail Lermontov living at the same period?"(3) "Was that their last album?"=> "Was The White Album The Beatles last album?" For multiple ellipses, since such samples cannot be observed in the ConvQuestions dataset, we manually create and evaluate 5 samples like "Birthplace and birthtime?"and " When and where?".The rewritten questions are "Birthplace and birthtime of soccer player Zinedine Zidane?" and "When and where did Haruki Murakami win the Franz Kafka Prize?".The 5 samples are all correctly rewritten because it is easy for the rewriter to generalize from the single ellipsis such as "when" or "where" to the multiple ellipses "when and where".
Performance on Self-Contained Questions.Whether the question rewriter has negative effect on the rewriting process if the question is already self-contained?Since the questions in ConvQuestions are almost incomplete, we randomly sample 100 self-contained questions from CSQA (Marion et al., 2021) and rewrite them by our model.We observe that the rewriter prefers to remain the original question as much as possible.About 95% of the rewriting questions are exactly the same as the original questions, and the remaining 5% are averagely 0.978 similar (#common words at the same position / #words in the longer question) to the original questions.The results indicate that the proposed question rewriter does not influence the orginal self-contained questions.

Conclusion
This paper proposes a rewrite-and-reason framework to address the ellipsis and coreference for ConvKBQA.To overcome the lack of annotations, we introduce a knowledge-augmented self-training mechanism for training the question rewriter.Thanks to the decoupled design, we can unite the rewriter with both the retrieval-and SP-based single-turn KBQA models.The experiment results show that our method can outperform existing Con-vKBQA systems and achieve new state-of-the-art.
The model is trained for 5 epochs with the batch size as 10, where for each of the 10 questions, we retrieve the positive relations following Figure 4 and complete 64 relations in total by negative sampling.We save the checkpoint obtaining the best H1 on the validation set.A training epoch takes about 15-20 minutes and it takes about 1-2 hours for the model to converge.

A.4 Question Rewriter
Pre-training.We use T5-base10 as the backbone of the question rewriter and finetune it with the Transformer Trainer11 on CANARD (Elgohary et al., 2019).We set the learning rate as 1e-4, the gradient accumulation steps as 4, the batch size as 32, and the epochs as 5.A training epoch takes about 8-12 minutes and it takes about 1 hour for the model to converge.
Pseudo Label Generation.We adopt the pretrained question rewriter to generate pseudo labels on our dataset ConvQuestions for fine-tuning.The whole generation process takes about 30-60 minutes.
Once we get the pseudo labels, i.e., the rewritten questions, we perform ELQ (Li et al., 2020) to obtain topic entities.However, since the tool may produce some irrelevant entities, we only keep the entities appearing in the conversation history or in the one-hop neighbors of the entities in the conversation history.If none of such entities can be recognized, we use the seed entity of the whole conversation as the topic entity of the current turn's question.Algorithm 1 presents the details.The process of topic entity identification on the whole ConvQuestions dataset takes about 3-4 hours.
We select rewritten questions that contain entities whose one-hop subgraphs can cover the correct answers.That is to say, on one hand, the identified topic entities from the rewritten question can reach the correct answers, and on the other hand, the textual names of the topic entities should be completely appearing in the rewritten question.In this way, we exclude the impact of the entity linking procedure and acquire high-quality rewritten questions.

Figure 2 :
Figure 2: Overview of the rewrite-and-reasoner framework.A question rewriter is pre-trained on the open domain ConvQA dataset and is fine-tuned on the concerned ConvKBQA dataset by knowledge-augmented self-training,where the knowledge is obtained by a relation retriever that is trained on pseudo (question, relation) pairs.Given the self-contained questions output by the rewriter, we train a retrieval-based reasoner on the retrieved subgraph, and also train a SP-based reasoner by the similar pre-training and fine-tuning paradigm.

Figure 3 :
Figure 3: An example of conversation history supplementation.The retrieved relevant relations to the current turn's question are supplemented to the seed entities and answer entities in the conversation.
Figure 4: The automatic construction process of pseudo (question, relation) labels.
Where were the F. Scott Fitzgerald and Zelda Fitzgerald married?

Table 2 :
Ablation studies of knowledge injection and self-training for rewriter by answer coverage rate (%).All means evaluating on train, validation, and test sets.

Table 3 :
Ablation studies of knowledge injection and self-training for rewriter by QA performance (%).

Table 4 :
Ablation studies of knowledge injection and self-training for KoPL by QA performance (%).