PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training

Fact verification has attracted a lot of attention recently, e.g., in journalism, marketing, and policymaking, as misinformation and dis- information can sway one’s opinion and affect one’s actions. While fact-checking is a hard task in general, in many cases, false statements can be easily debunked based on analytics over tables with reliable information. Hence, table- based fact verification has recently emerged as an important and growing research area. Yet, progress has been limited due to the lack of datasets that can be used to pre-train language models (LMs) to be aware of common table operations, such as aggregating a column or comparing tuples. To bridge this gap, this paper introduces PASTA for table-based fact verification via pre-training with synthesized sentence–table cloze questions. In particular, we design six types of common sentence–table cloze tasks, including Filter, Aggregation, Superlative, Comparative, Ordinal, and Unique, based on which we synthesize a large corpus consisting of 1.2 million sentence–table pairs from WikiTables. PASTA uses a recent pre-trained LM, DeBERTaV3, and further pre- trains it on our corpus. Our experimental results show that PASTA achieves new state-of-the-art (SOTA) performance on two table-based fact verification datasets TabFact and SEM-TAB- FACTS. In particular, on the complex set of TabFact, which contains multiple operations, PASTA largely outperforms previous SOTA by 4.7% (85.6% vs. 80.9%), and the gap between PASTA and human performance on the small test set is narrowed to just 1.5% (90.6% vs. 92.1%).


Introduction
Fact verification, which checks the factuality of a statement, is crucial for journalism (Shu et al., 2017), and is increasingly being applied in other fields (Ott et al., 2011;Yoon et al., 2019).According to Duke Reporters' Lab, there are 300+ active certified fact-checking organizations worldwide. 2utomatic and explainable approaches, a.k.a.reference-based approaches, are widely used to assist fact-checkers.They verify the input statement against a trusted source, such as relevant passages from Wikipedia (Popat et al., 2017;Thorne et al., 2018;Shaar et al., 2020).Recently, table-based fact verification has been extensively studied (Chen et al., 2020a;Zhong et al., 2020;Eisenschlos et al., 2020) due to the wide availability of tabular data.
Evidently, performing fact verification over tables requires the ability to reason about tablebased operations, such as aggregating the values in a column or comparing tuples.For example, for the statement S 1 in Figure 1, it is desirable to reason about the operation over table T that compares the viewers of Night Moves with 3.61 to determine whether S 1 is entailed or refuted by T .
Most previous work (Herzig et al., 2020;Wang et al., 2021a;Schlichtkrull et al., 2021) leverages pre-trained language models (LMs) (Devlin et al., 2019;Liu et al., 2019), which are originally designed for unstructured data, and have a key limitation of overlooking such operations.Some approaches (Zhong et al., 2020;Yang et al., 2020) attempt to explicitly capture the operations by generating a logical form (e.g., a tree) containing the operations from the statement via semantic parsing techniques.However, such approaches face the problem of "spurious programs" (Chen et al., 2020a), due to weak supervision signals in semantic parsing.
To address the above issues, we propose PASTA, a table-operations aware approach.Instead of relying on semantic parsing, PASTA captures tablebased operations by designing a novel sentence-  We tackle two challenges for pre-training LMs towards supporting table-based fact verification: • Challenge 1: What types of tasks should be designed, so as to pre-train (or teach) LMs to be aware of operations over tables?
• Challenge 2: How to obtain a large-scale and high-quality corpus for pre-training?
To address Challenge 1, PASTA automatically synthesizes sentence-table cloze questions.It first synthesizes operations from tables, and then generates cloze tasks by masking the key tokens corresponding to the table-based operations, e.g., "more than" and "sum of " in Figure 1.Then, LMs are pre-trained to predict the masked operation-aware tokens based on the tables.
Regarding Challenge 2, PASTA uses a large table collection, WikiTables (Bhagavatula et al., 2013), and for each table, it synthesizes a diverse set of cloze tasks with six types of table-based operations, including Filter, Aggregation, Superlative, Comparative, Ordinal, and Unique.
For implementation, PASTA uses a recent pretrained LM, DeBERTaV3 (He et al., 2021a,b) with better positional encoding of the input.To cope with the limited input length of DeBERTaV3, we introduce a select-then-rank strategy for large tables, which further improves the performance.
In sum, we make the following contributions.The experimental results show that PASTA achieves new state-of-the-art (SOTA) results on the two datasets.In particular, on the complex set of TabFact that contains multiple operations, PASTA outperforms the previous SOTA by 4.7 points (85.6% vs. 80.9%), and the gap between PASTA and human performance on the small test set is narrowed to 1.5 points (90.6% vs. 92.1%).

Problem Formulation
Let T be a table with m columns and n rows.Let T i,j denote the cell in the i-th column and j-the row of T .Let S be a natural language (NL) statement.
The problem of table-based fact verification is formulated as follows: Given an NL statement S and a table T , it determines whether statement S can be entailed or refuted by table T3 .
See Figure 1 for an example table about movies and their viewers, and two statements where S 1 contains a Comparative operation and S 2 contains an Aggregation operation.

DeBERTa for Sentence-Table Encoding
Inspired by the success of BERT-like models (Devlin et al., 2019;Liu et al., 2019;Clark et al., 2020) in natural language understanding (NLU) tasks, many existing studies leverage pre-trained LMs for table understanding, achieving superior results (Chen et al., 2020b;Schlichtkrull et al., 2021).In this paper, we apply DeBERTa (He et al., 2021b) for sentence-table encoding, as it can effectively capture positional information of the input with its positional encoding scheme, which is useful for sentence-table encoding.
Given an input token at position i, DeBERTa represents it using two vectors, {H i } and {P i|j }, to represent its content and relative position with respect to the token at position j.For a single-head self-attention layer, DeBERTa (He et al., 2021b) represents the disentangled self-attention mechanism as follows: where Ãi,j represents the attention score between token i and token j.The content vector H is projected by the matrices W q,c , W k,c , W v,c ∈ R d×d to generate the projected content vectors Q c , K c and V c , respectively, P ∈ R 2k×d is the relative position embedding vector, and δ(i, j) is the relative distance from token i to token j.Similarly to H, P is projected by the matrices W q,r , W k,r ∈ R d×d to generate the projected position vectors Q r and K r , respectively.
In our implementation, we adopt the latest version DeBERTaV3 (He et al., 2021a), which improves DeBERTa by further pre-training with replaced token detection (RTD) to jointly encode a sentence and a table.Unlike NL, tables have distinct structural information that is difficult to capture by a pre-trained LMs.Therefore, we use special symbols to inject the structural information into an NL sentence.Specifically, we linearize the table

Our PASTA Model
Figure 2 gives an overview of our PASTA framework, which follows the pre-training-fine-tuning framework.For pre-training, we guide our model to understand sentences and to perform table-based operations (e.g., Aggregation) to complete synthesized cloze tasks in the sentence.For finetuning, we apply a select-then-rank strategy to trade off between sizes of large tables and the limited input length of DeBERTaV3.Next, we will first present our sentence-table cloze task and pretraining corpus generation in Sections 3.1 and 3.2, respectively.We will then discuss our fine-tuning strategy in Section 3.3.Inspired by the Masked Language Modeling (MLM) (Devlin et al., 2019), we design a cloze task to pre-train the model's ability to reason about operations over tables.However, the key difference is that we do not use the random masking strategy in MLM due to the following reasons.First, masking a specific cell of the table and training the model to predict it is difficult, because, unlike the words in a sentence, the contents of a cell may not be predictable from the contents of the surrounding cells.Moreover, the content of an individual cell may be useless for determining whether statement S can be entailed or refuted by table T .Second, not every token in the sentence needs to be predicted.For example, in the sentence "The Palazzo has more floors than Las Vegas Hilton.",tokens like has and the, which can be easily predicted from contextual information, are not worth learning for the model because this kind of ability is already captured by pre-trained LMs.

Sentence-
To pre-train the model to be aware of table operations, we propose to mask operation-aware tokens in the sentence, which need to meet two requirements: (i) to appear in the sentence and to correspond to table-based operations, and (ii) to be predictable by reasoning over tables.For example, in the above sentence, as the model needs to find the numbers of floors for "Las Vegas Hilton" and for "The Palazzo" in the table and then to compare them, "more" is the operation-aware token that the model needs to predict.
To cover common types of operations in pretraining, we refer to the operations list defined in LPA (Chen et al., 2020a).We design six sentence- table cloze tasks according to various operation types: Filter, Aggregation, Superlative, Comparative, Ordinal, and Unique.Figure 2 shows some example cloze tasks of operation types, Filter, Aggregation and Comparative.The answer to a cloze task, i.e., operation-aware tokens to be predicted, may be a specific table cell (e.g., "114") or the result of a series of calculations (e.g., "134.7","more").Note that we assume that only atomic operation types need to be learned in pre-training, and various combinations and expressions are left to fine-tuning.Thus, only one type of operation-aware token is masked in each statement.
We formally define the table-operations aware pre-training task as follows: Given a sentence S = {x i } and a table T = {T i,j |i ≤ m, j ≤ n}, we corrupt S into S by masking the operation-aware span of tokens S span = { xi } ⊂ S in it, and then we train an LM parameterized by θ to reconstruct S by predicting the masked tokens { xi }, i.e., optimizing the following objective:

Pre-training Corpus Generation
Next, we introduce our strategy for generating the pre-training corpus consisting of sentence-table pairs.According to the table-operations aware pretraining task described in Section 3.1, the difficulty of corpus generation is how to collect a large scale of sentence-table pairs and how to identify the operation-aware tokens in each sentence.To solve these problems, we propose an automatic data generation method, which consists of table collection and sentence generation.In addition, we also introduce a probing-based sentence polishing method to make it more fluent and natural.

Table Collection.
Inspired by previous work (Herzig et al., 2020;Schlichtkrull et al., 2021), we use WikiTables,4 which contains Web tables extracted from Wikipedia.Concretely, we only select well-formed relational tables that contain headers and at least one numeric column that can be used for operations.Moreover, considering the maximum input length (512 tokens) of our pre-trained LM, we filter out all tables with more than 500 cells.Based on the above process, we obtain a total of 580K tables from WikiTables, and we then randomly select 20K tables to improve the efficiency of pre-training.
Sentence Generation. Figure 3 shows the pipeline of our automatic sentence generation method.To ensure that each sentence contains an operation and the operation-aware tokens can be clearly identified, we design NL Templates for each table-aware operation type (e.g., the NL Template in Figure 3 is designed for Comparative.See more details of the manually designed templates in Appendix A).Each NL Template is pre-defined with the position of the operation-aware tokens (e.g., "higher").Unlike fact verification, the sentence in the cloze task must be a correct description of the table.To achieve this, we design an SQL Template for each NL Template, and these two templates will be instantiated based on the table at the same time.During instantiation, the [Column] in both templates is replaced by a column header (e.g., [Column1] is replaced by team) and the [Value] is instantiated by a cell (e.g., [Value] is replaced by 97).Then, the SQL Instance will be automatically executed on the table, and the execution result [ANS] will be filled in the NL Instance to ensure its correctness.
Based on the above method, we generate up to 100 related sentences for each table, depending on the size of the table.Statistics about the pre-training corpus are given in Table 1.We can see that the proportion of each type is different; it mainly depends on how many different expressions a type contains.Taking Aggregation with the highest proportion as an example: in addition to "the average of" in Figure 2, there may also be "the sum of", "the total amount of", etc.
Sentence Polishing.We notice that using fixed templates for each table could generate unnatural sentences.For example, in Figure 3, if [Column2] is not populated with "score" but by "age", then the operation-aware token should use "older" instead of "higher".Therefore, we introduce a probing-based method to improve the fluency of sentences, which leverages the rich knowledge learned by the LMs implicitly during pre-training.Our main idea is that since BERT-like LMs are pre-trained on extensive textual corpora, their predictions can approximate the natural language expressions used in real-world scenarios.
Specifically, we identify the context sensitive word w ′ in each template (e.g., "higher"), and define a set of candidate values (e.g.,"higher", "more", . . ., "older") for w ′ .Then, we replace the w ′ with [MASK] and leverage a fixed LM to de- During fine-tuning, we evaluate our model on two widely-adopted table-based fact verification benchmark datasets TabFact (Chen et al., 2020a) and SEM-TAB-FACTS (Wang et al., 2021b).Tab-Fact contains 16K tables collected from WikiTables and 118K human-annotated natural language statements, where each statement-table pair is labeled as either entailed or refuted.TabFact contains statements with two difficulty levels: (i) simple statements corresponding to single rows, and (ii) complex statements involving multiple rows with table-based operations like Aggregation.SEM-TAB-FACTS contains 2K tables and 4K humanannotated natural language statements.Different from TabFact, these tables are collected from scientific articles in a variety of domains.We use the official splits in the two benchmarks for evaluation: the training, validation and test sets of Tab-Fact respectively contain 92283, 12792 and 12779 sentence-table pairs; the training, validation and test sets of SEM-TAB-FACTS respectively contain 4506, 423 and 522 sentence-table pairs.In addition, TabFact also holds out a small test set with 2K sentence-table pairs with human performance.

Baselines
We evaluate PASTA with the following ten state-ofthe-art methods for table-based fact verification.LogicFactChecker (Zhong et al., 2020) leverages a sequence-to-action semantic parser to generate a "program", i.e., a tree with multiple operations, and uses a graph neural network to encode statements, tables, and the generated programs.SAT (Zhang et al., 2020) creates a structure-aware mask matrix to encode the structural data.In par-ticular, it considers recovering the alignment information of tabular data by masking signals of unimportant cells during self-attention.ProgVGAT (Yang et al., 2020) integrates programs and execution into a natural language inference model.This method uses a verbalization with a program execution model to accumulate evidences and constructs a graph attention network to combine various evidences.Tapas (Herzig et al., 2020) (Shi et al., 2020).SaMoE (Zhou et al., 2022) develops a mixtureof-experts network based on the RoBERTa-large model (Liu et al., 2019).The MoE network consists of different experts, and then a management module decides the contribution of each expert network to the verification result.Volta (Gautam et al., 2021) analyzes how transfer learning and standardizing tables to contain a single header row can boost the effectiveness of tablebased fact verification.LKA (Zhao and Yang, 2022) studies the sentencetable's evidence correlation.It develops a dualview alignment module based on the statement and table views to identify the most important words through various interactions.

Implementation Details
Our model is implemented based on the transformer architecture (Wolf et al., 2020).Specifically, we start pre-training with the public DebertaV3-Large checkpoint5 and optimize the learning objective with Adam (Kingma and Ba, 2015).Our pre-training process runs up to 400K steps with a batch size of 16 and a learning rate of 1 × 10 −6 .The complete pre-training procedure takes about 3 days on 2 RTX A6000 GPUs.For fine-tuning, the model runs up to 300K steps with a batch size of 8 and a learning rate of 5 × 10 −6 .

Overall Performance
Table 2 summarizes the overall experimental results for various fact verification methods on TabFact.We can see that PASTA achieves the new SOTA results on all splits of TabFact.In particular, on the complex set containing multiple operations, PASTA largely outperforms the previous state-of-the-art by 4.7 points (85.6% vs. 80.9%).
Table 2 also reports the good performance of DeBERTaV3 on the table-based fact verification task.This result is analogous to the observation in Schlichtkrull et al., 2021 that RoBERTa can yield strong performance exceeding the previous closedsetting (77.6% vs. 74.4%).Both results illustrate that the BERT-like model pre-trained on textual data can also perform well on linearized tabular data.However, PASTA surpasses DeBERTaV3 by 3.1 points on the test set (89.3% vs. 86.2%),i.e., 3.9 points and 2.7 points on the simple test set and complex test set respectively.The experimental result shows that PASTA endows DeBERTaV3 with more powerful statement-table reasoning ability, which is very crucial for fact verification.
Table 3 shows that PASTA outperforms all baseline models by large margins on the SEM-TAB-FACTS dataset.In particular, PASTA significantly surpasses the DeBERTaV3 model by 5.2 points (84.1% vs 78.9%) on the test set.This shows that although our pre-training corpus only contains tables from Wikipedia, it can be applied to other domains, such as tables from scientific articles included in the SEM-TAB-FACTS dataset.the steps, PASTA is firstly capable of reasoning about Comparative operations, and finally mastering Aggregation and Filter operations.This may be attributed to the difficulty of the operations (e.g., Aggregation is harder than other types) and the length of token span that needs to be predicted (As shown in Table 1, Filter needs to predict more tokens than other types).

Operation understanding on fact verification.
We further analyze whether the model can utilize the reasoning ability learned from pre-training for our downstream task, i.e., table-based fact verification.To this end, we evaluate PASTA on test sets of different operation types.We split the test set of TabFact according to the trigger words in the statement, which are defined in Appendix B, e.g., "highest" and "lowest" related to the Superlative type.We control the size of each test set as 200, while ensuring that these sets have no overlap.We compare PASTA and DeBERTaV3 on these test sets, and the results are shown in the Table 4.We can see that PASTA outperforms DeBERTaV3 on every test set, especially on the Aggregation type.Note that we did not use any fine-tuning strategy on both models, and thus all the improvements of PASTA are to be attributed to our table-operations aware pre-training strategy.
Comparison with Masked Language Modeling.We compare PASTA with the random masking scheme in Masked Language Modeling (MLM).For MLM, we randomly mask 15% of the tokens in a sentence-table pair, of which 10% of the masked tokens remain unchanged, 10% are replaced with randomly picked tokens, and the remainders are replaced with the [MASK] token.For PASTA, we  use the masking strategy introduced in Section 3.1, which only masks the operation-aware span in the sentence.We pre-train both MLM and PASTA on DeBERTaV3.Considering pre-training efficiency, for both MLM and PASTA, we set the training step as 140K.Table 5 shows the fine-tuned results of MLM and PASTA on the TabFact dataset.We can see that PASTA outperforms MLM by a large margin on the complex set.This improvement further proves that our table-operations aware pre-training task helps the high-order symbolic reasoning in the complex set.By also observing Table 2, we find that the impact of MLM decreases slightly on the basis of DeBERTaV3 (84.9% vs. 86.2%).This may be because the random masking scheme in MLM does not work for sentence-table joint understanding, as we have already analyzed in Section 3.1.

Impact of Select-then-Rank
To verify the effectiveness of the select-then-rank method, we conduct experiments with column-wise selection and row-wise ranking on the TabFact dataset.The results of the experiment are shown in Table 6.We can see that the row-wise ranking strategy is more effective than the column-wise selection strategy.The main reason may be that the disentangled attention mechanism in DeBERTa makes the model more sensitive to the positional information of the input.The row-wise ranking strategy can put the most relevant cells in the table closer to the sentence, and thus the model can more effectively capture the sentence-table relationship.

Error Analysis
To analyze the errors of PASTA for table-based fact verification, we analyze the sentence-table pairs that PASTA predicts incorrectly in the Tab-Fact dataset.Specifically, we consider the size of the tables and the complexity of the operations in the statements.Table 7 presents some basic statistics about the subset of the test where PASTA makes mistakes.For comparison, we also list the basic statistics about DeBERTaV3's error set and the full test set.We have the following observations.(1) Our models may not perform well on large tables.Concretely, although PASTA reduces the impact of large tables on the fact verification task compared to DeBERTaV3 (97.5 vs. 107.4), the impact of large tables on PASTA still exists compared to the average table size in the test set (97.5 vs. 89.0).
(2) The number of operations in the statement is also an important cause of errors: the proportion of statements with multiple operations in PASTA's error set (16.5%) is larger than that of the overall test set (11.3%).Thus, PASTA correctly verifies most of the statements that contain only a single operation, but it still encounters difficulty to verify statements that contain multiple types of operations.(Zhou et al., 2022).The main difference is that SaMoE uses different experts to solve different types of operations, while our method assumes that the operation combinations in statements are complex and diverse.We directly inject the atomic operations into the model through operation-aware pre-training, and then fine-tune the model to learn the various operation combinations on downstream datasets.However, PASTA is compatible with the mixtureof-experts framework, and thus we would evaluate PASTA with MoE structure on table-based fact verification datasets in the future.

Conclusion and Future Work
We introduced PASTA, a table-operations aware pretraining approach to train LMs for better performing fact verification over tables.PASTA achieved new SOTA results on two widely-adopted tablebased fact verification benchmark datasets, Tab-Fact and SEM-TAB-FACTS.Future work should explore how to address the challenges of more complex operations and large tables in fact verification.

Limitations
The first limitation of our work is that our synthetic pre-training corpus may lack diversity.As explained in Section 3.2, to ensure the correctness and controllability of these sentences, we generate the pre-training corpus using human-designed natural language templates.While our insight is that only atomic operations need to be learned at pre-training, which reduces the need for diversity, generating high-quality sentences with both diversity and controllability to support self-supervised learning is still a direction worth exploring.
The second limitation of our work is that fact verification is only supported on a single table.Although the TabFact (Chen et al., 2020a) dataset we used assumes that each statement can be verified by a single table, a more realistic scenario would be to combine information from multiple tables.Exploring how to do this effectively and to overcome the limitation of the input length of BERT-like models is a important direction for future work.

Ethics Statement
Dataset Collection For the pre-training dataset, we use the publicly available WikiTables (Bhagavatula et al., 2013) dataset as the table source and select high-quality relational tables from it.Then we use these tables to generate entailed statements, instead of collecting statements from the web.For the fine-tuning dataset, we use the publicly available datasets, TabFact (Chen et al., 2020a) and SEM-TAB-FACTS (Wang et al., 2021b).
Intended Use and Misuse Potential The goal of the fact verification task is to help identify misinformation.Our work focuses on verifying the statements based on analysis over tables and aims to pre-train language models to be aware of common table operations, such as aggregation over a column or comparing two tuples.It should be noted that while we treat the tables from TabFact and SEM-TAB-FACTS as trustworthy sources of evidence in the experiments, we do not assume that all of the tables in the network are trustworthy and unbiased.So, this work could also be misused through fact verification on unreliable or socially biased tables.

Figure 1 :
Figure 1: An example of table-based fact verification.

Figure 2 :
Figure 2: An overview of the pre-training and fine-tuning procedures of PASTA.

Figure 4 :
Figure 4: Accuracy on operation-aware cloze tasks at different training steps.For each operation, the size of its test set is 1K where the test set does not contain any tables from the training set.
table cloze pre-training strategy that better guides LMs to reason about table-based operations.
n .We use[Header]to indicate the beginning of the headers and [Row] to indicate the beginning of each row.Inspired by Liu et al., 2021, we also use "|" to separate each cell.Afterwards, we concatenate the statement S and the linearized table S T .
Table Cloze Pre-training

Table 1 :
(Chen et al., 2020a)e-training corpus, where "Len (Ans)" represents the average length of the answer.To make the table size to meet the input length limit of DeBERTaV3, we follow previous work(Chen et al., 2020a)to only select columns in the table containing entities linked to the statement, which results in a pre-processed table T .Note that the reason for not selecting by rows is that some operations may involve an entire column of cells, e.g., Aggregation.Row-wise Ranking.To make the sentence and its relevant cells in the table have closer positions, we reorder the table T by row.Specifically, we slice T into a set of rows {r 1 , . . ., r m }, and rank these rows by their relevance scores {p Finally, we reconstruct the table by ordering the rows in descending order of the relevant scores, before applying the table linearization introduced in Section 2.2.
termine the appropriate value for [MASK] based on context.For example, if [Column2] is populated as "age", the pre-trained LM calculates probabilities for all candidate values and then selects the one with the highest probability, e.g., "older".Please refer to Appendix A for the detailed definition of context sensitive words and their candidate sets.3.3Fine-tuning with Select-then-RankFor fine-tuning, we pre-process the table based on the following two considerations: (i) As mentioned in Eisenschlos et al., 2020, the sentence-table pairs in the downstream datasets may be too long for pre-trained LMs, and (ii) considering that the disentangled attention mechanism in DeBERTa makes the model more sensitive to the positional information of the input, we assume that it would be easier for the model to capture the sentence-table relationship by putting the most relevant cells in the table closer to the sentence.To address the above two problems, we propose a select-then-rank method to reconstruct the table content.Column-wise Selection.i }, as defined below.Let ri and ŝ denote the token sets of row r i and statement s respectively.The relevance score p i is given by | ri ∩ ŝ|.Note that we remove the stopwords (e.g., the) in ri and ŝ.

Table - BERT
(Chen et al., 2020a)adopts templates to linearize a table into an NL sentence, and then directly leverages a BERT model to encode the linearized table and the statement.
extends BERT with additional structure-aware positional embeddings to represent the tables.Eisenschlos et al., 2020 further pre-train Tapas on counterfactually-augmented and grammar-based synthetic statements.Schlichtkrull et al., 2021 study table-based fact verification in an open-domain setting, and combine a TF-IDF retrieval model with a RoBERTabased joint reranking-and-verification model.Tapex (Liu et al., 2021) guides the pre-trained BART model to mimic an SQL executor via an execution-centric table pre-training approach.The pre-training corpus of Tapex is synthesized via sampling SQL queries from the SQUALL dataset Aware Pre-training Performance on cloze pre-training.As described in Section 3.1, PASTA is pre-trained on tableoperations aware cloze tasks to be capable of reasoning about table-aware operations over tables.To explore whether the model has learned such ability, we generate six test sets corresponding to different operation types, and evaluate the performance of PASTA in various steps during pre-training.The experimental results are reported in Figure 4. Overall, after 400K steps, PASTA can correctly complete more than 60% sentence-table cloze questions with various types.More specifically, with increasing

Table 2 :
Performance on TabFact in terms of binary classification accuracy (%).The human performance on a small set is from Chen et al., 2020a.The notation "-" indicates that the corresponding values are not listed in the original paper.In addition, models are evaluated with 5 random runs.

Table 4 :
Binary classification accuracy (%) on sentence-table pairs containing different types of operations.The six sets are sampled from TabFact based on trigger words, and each set contains 200 sentence-table pairs.

Table 5 :
Ablation study for the masking scheme.MLM uses random masking and PASTA uses table-operations aware masking.To avoid co-effects, we didn't use any data pre-processing method on MLM or PASTA.Models are evaluated with 5 random runs.

Table 6 :
Ablation study for the select-then-rank strategy."w/o col" means that the column-wise selection strategy is not used, and "w/o row" means that the row-wise ranking strategy is not used.