Reinforcement Learning for Topic Models

We apply reinforcement learning techniques to topic modeling by replacing the variational autoencoder in ProdLDA with a continuous action space reinforcement learning policy. We train the system with a policy gradient algorithm REINFORCE. Additionally, we introduced several modifications: modernize the neural network architecture, weight the ELBO loss, use contextual embeddings, and monitor the learning process via computing topic diversity and coherence for each training step. Experiments are performed on 11 data sets. Our unsupervised model outperforms all other unsupervised models and performs on par with or better than most models using supervised labeling. Our model is outperformed on certain data sets by a model using supervised labeling and contrastive learning. We have also conducted an ablation study to provide empirical evidence of performance improvements from changes we made to ProdLDA and found that the reinforcement learning formulation boosts performance.


Introduction
The internet contains large collections of unlabeled textual data.Topic modeling is a method to extract information from this text by grouping documents into topics and linking these topics with words describing them.Classical techniques for topic modeling, the most popular being Latent Dirichlet Approximation (LDA) (Blei et al., 2003), have recently begun to be overtaken by Neural Topic Models (NTM) (Zhao et al., 2021).
ProdLDA (Srivastava and Sutton, 2017) is a NTM using a product of experts in place of the mixture model used in classical LDA.ProdLDA uses a variational autoencoder (VAE) (Kingma and Welling, 2013) to learn distributions over topics and words.ProdLDA improved on NVDM (Miao et al., 2016) by explicitly approximating the Dirichlet prior from LDA with a Gaussian distribution and using the Adam optimizer (Kingma and Ba, 2014) with a higher momentum and learning rate.
Perceiving Reinforcement Learning (RL) as probabilistic inference has brought practices of such an inference into the RL field (Dayan and Hinton, 1997) (Levine, 2018).New algorithms using these techniques include MPO (Abdolmaleki et al., 2018) and VIREL (Fellows et al., 2019).MPO optimizes the evidence lower bound (ELBO), which is the same optimization objective used in VAEs.
Inspired by the adoption of probabilistic inference techniques in RL, we look to apply RL techniques to probabilistic inference in the realm of topic models.We use REINFORCE, the simplest policy gradient (PG) algorithm, to train a model which parameterizes a continuous action space, corresponding to the distribution of topics for each document in the topic model.We keep the product of experts from ProdLDA to compute the distribution of words for each document in the topic model.
We additionally improve our topic model by using Sentence-BERT (SBERT) embeddings (Reimers and Gurevych, 2019) rather than bag-ofword (BoW) embeddings, modernizing the neural network (NN) architecture, adding a weighting term to the ELBO, and tracking topic diversity and coherence metrics throughout training.The model architecture is shown in Figure 1.Our method outperforms most other topic models.It is beaten only on some data sets by advanced methods using document labels for supervised learning, while our procedure is fully unsupervised.
Our approach is a modification of the ProdLDA model (Srivastava and Sutton, 2017).The novelty of our approach is as follows: • Using a parameterized RL policy to infer the topic distribution rather than the VAE used in ProdLDA.• Removing the softmax applied to the topic distribution, which was required in ProdLDA to approximate the simplex from the Dirichlet distribution.
• Using the GELU activation function (Hendrycks and Gimpel, 2016) and layer normalization (Ba et al., 2016), mitigating the component collapse experienced by ProdLDA and allowing reversion of increases in optimizer learning rate and momentum required for ProdLDA to function.
• Optionally removing the dropout layer applied to the topic distribution in ProdLDA, which we found to increase topic coherence but decrease topic diveristy to a much greater degree.

Related Work
Zhao et al. ( 2021) provide a survey of NTMs.Variations of VAEs are presented which use different distributions, correlated and structured topics, pretrained language models, incorporate meta-data, or model on short texts rather than documents.Methods other than VAEs are also used for NTMing, including autoregressive models, generative adversarial networks, and graph NNs.Doan and Hoang (2021) compare ProdLDA and NVDM, along with six other NTMs and three classical topic models, in terms of held-out document and word perplexity, downstream classification, and coherence.Scholar (Card et al., 2017), an extension of ProdLDA taking document metadata and labels into account where possible, performed best in terms of coherence.NVDM and NVCTM (Liu et al., 2019), an extension of NVDM which additionally models the correlation between documents, performed best in terms of perplexity and downstream classification.The other NTMs were GSM (Miao et al., 2017), NVLDA (Srivastava and Sutton, 2017), NSMDM (Lin et al., 2019), and NSMTM (Lin et al., 2019).The classical topic models were non-negative matrix factorization (NMF) (Zhao et al., 2017), online LDA (Hoffman et al., 2010), and Gibbs sampling LDA (Griffiths and Steyvers, 2004).
BERTopic (Grootendorst, 2022) and Top2Vec (Angelov, 2020) use dimensionality reduction and clustering to group document embeddings from pre-trained language models into meaningful clusters.Contextualized Topic Models (CTM) (Bianchi et al., 2020a) augments the BoW embeddings used in ProdLDA with SBERT (Reimers and Gurevych, 2019) embeddings, resulting in an improved topic model.Dieng et al. (2020) develop the embedded topic model (ETM) by using word embeddings to augment a variational inference algorithm for topic modeling.Their method outperforms other topic models, especially on corpora with large vocab-ularies containing common and very rare words.Nguyen and Luu (2021) augment Scholar (Card et al., 2017) with contrastive learning (Hadsell et al., 2006) and outperform all topic models compared against.Gui et al. (2019) use RL to filter words from documents, with reward as a combination of the resulting topic model's coherence and diversity, or how few words overlap between topics.Kumar et al. (2021) use REINFORCE (Williams, 1992), a PG RL algorithm, to augment ProdLDA.Their model slightly outperforms ProdLDA in terms of topic coherence.Shahbazi and Byun (2020) use RL to augment a non-negative matrix factorization topic model.

Background
We briefly outline topic models, RL process, KL divergence, and contextual embeddings.

Topic Models -Approaches
Latent Dirichlet Allocation (LDA) (Blei et al., 2003) is a three-level hierarchical Bayesian model: documents → topics → words.Each document is a mixture over latent topics, where the topic distribution θ is randomly sampled from a Dirichlet distribution.Each topic is a multinomial distribution over vocabulary words.
Autoencoding Variational Inference for Topic Models (AVITM) (Srivastava and Sutton, 2017) is a neural topic model using a VAE to learn a Gaussian distribution over topics.VAEs use a reparameterization trick (RT) to randomly sample from the posterior distribution to remain fully differentiable.At the time, there was no known RT for Dirichlet distributions, so AVITM used a Gaussian distribution and a Laplace approximation of the Dirichlet prior.
AVITM contains two models: NVLDA and ProdLDA.NVLDA uses the mixture model from LDA to infer a distribution over vocabulary words, while ProdLDA uses a product of experts.
Evidence Lower Bound (ELBO) is the optimization objective for AVITM.ELBO optimization (Jordan et al., 1999) simultaneously tries to maximize the log-likelihood of the topic model and minimize the forward Kullback-Leibler (KL) divergence (Kullback and Leibler, 1951) between the posterior P and prior Q topic distributions.ELBO = D KL (P ||Q) − log-likelihood (1)

Topic Models -Evaluation
Topic Coherence is a metric for evaluating topic models.It uses co-occurence in a reference corpus to measure semantic similarity between the top-K words in a topic.Topic model coherence is the average of each topic's coherence.
Normalized pointwise mutual information (NPMI) (Aletras and Stevenson, 2013) was the coherence measure found to correlate best with human judgment (Lau et al., 2014).When computing NPMI, a window size of 20 for co-occurrence counts is used in Srivastava and Sutton (2017), while Dieng et al. ( 2020) uses full document cooccurrence.
NPMI coherence is calculated for each of the top-K words in a topic and averaged to obtain the coherence for that topic.The overall topic-coherence is the average of the coherence for each topic.For a word i, the NPMI coherence is calculated according to Equation 2.

NPMI(w
where P (w i ) is the probability of word i occurring in a document in the corpus, and P (w i , w j ) is the probability of words i and j co-occurring in a document in the corpus.(4) topic-quality = topic-coherence * topic-diversity

Reinforcement Learning
RL is a sequential decision-making framework focused on finding the best sequence of actions executed by an agent.(Sutton and Barto, 2018).An agent takes actions a ∈ A to traverse between states s ∈ S in an environment, receiving a reward r on each transition.The goal of an RL task is to find the best set of actions -referred to as the policy -which maximizes the reward.RL problems can be episodic, where the agent completes the environment and is reset, or continuing, where the agent continuously traverses the environment without reset.Through traversing the environment, the agent learns a policy π of which actions in each state will maximize return.Return is the cumulative reward received by the agent in an episode or its lifetime.It is usually discounted by a factor γ to favor near-term reward over long-term reward.
An alternative to discounting is the average reward formulation.
Policy Gradient (PG) Algorithms Many RL algorithms learn a value function -representing values associated with selecting specific actions -and a corresponding policy that chooses the action or subsequent state with maximum value.PG algorithms (Sutton et al., 1999) provide an alternative approach directly learning a parameterized policy.
The parameters of the policy function are optimized through stochastic gradient ascent.
REINFORCE is a Monte Carlo PG algorithm for episodic problems (Williams, 1992).See algorithm 1, where ρ is a vector of optimized parameters.
Algorithm 1: REINFORCE Input: A differentiable parameterized policy function π(a|s, ρ) Algorithm Parameters: Continuous Action Spaces are one advantage of PG algorithms (Sutton and Barto, 2018).Parameterized policies allow action spaces that are parameterized by a probability distribution, such as a Gaussian.For Gaussian action spaces, the mean µ and standard deviation σ are given by function approximators parameterized by ρ.For a state s, an action a is sampled from the distribution and the policy is updated according to Equation 5. and Leibler, 1951) measures the similarity between two probability distributions P and Q.It is used in AVITM (Srivastava and Sutton, 2017) to force the posterior distribution parameterized by the VAE to be the Laplace approximation of the Dirichlet prior.The KL divergence calculation for N topics is shown in Equation 6.
The BoW document representation used in ProdLDA is augmented with contextual embeddings from SBERT Bianchi et al. (2020a).They test three models: one with BoW, one with contextual embeddings, and one with both.They find that using both embeddings produces the best results, and the other two methods perform almost as well.One advantage of using solely contextual embeddings is that multilingual language models can encode documents from different languages into the same embedding space, enabling easy creation of multilingual topic models (Bianchi et al., 2020b).
Sentence-BERT is an extension of BERT using a Siamese network to extract semantically meaningful sentence embeddings (Reimers and Gurevych, 2019).In contrast to BERT, this allows SBERT embeddings to be compared using dot product or cosine similarity, making SBERT more suitable for tasks such as semantic similarity search and clustering.
For the inference network, we increase the number of units in each layer from 100 to 128, add weight decay of 0.01 to each layer, and place dropout layers (Srivastava et al., 2014) after each fully connected layer.
We replace the softmax activation after the topic distribution with an RL policy formulation (Equation 5).We use a training batch size of 1024.We clip all gradients to a maximum norm of 1.0 to prevent gradient explosion (Pascanu et al., 2013).Following Bianchi et al. (2020a), we set both distributional priors as trainable parameters.We lower optimizer learning rate to 3e-4 and momentum to 0.9.

Document Embeddings
Following Bianchi et al. (2020a), we replace the BoW used by ProdLDA with contextualized embeddings from SBERT.We use the "all-MiniLM-L6-v2" model for encoding unpreprocessed documents as embedding vectors.BoW embeddings, used to calculate the log-likelihood of the topic model, are created using preprocessed documents.

Single-step REINFORCE with a Continuous Action Space
We adopt the view of RL as a statistical inference method (Levine, 2018).The modernized inference network from ProdLDA is used to parameterize a continuous action space from which an action is sampled, and the policy is computed according to Equation 5.The topic model distribution over vocabulary words uses the product of experts from ProdLDA.We use REINFORCE to train the network, with a weighted version of ELBO as the reward.Each document embedding is a state in the environment, and each episode terminates after a single step (i.e., action).Each action is a sample from the topic distribution.

Weighted Evidence Lower Bound
Following Higgins et al. (2016), we allow modifiable relative entropy between the prior and posterior by weighting the KL divergence term in the ELBO.We define a hyperparameter λ as a multiplier on the KL divergence term.

Initial Experiments
We initially evaluate our topic model on the 20 Newsgroups data set with 20 topics.Results averaged over 30 random seeds are shown: loss in Figure 2, topic coherence in Figure 3, and topic diversity in Figure 4. Mean and 90% confidence intervals are plotted.Topic diversity and coherence are calculated with K = 10.Documents are preprocessed following Bianchi et al. (2020a) with the additional step of removing all words with less than three letters.Models are trained for 1000 epochs with the AdamW optimizer (α = 3e − 4, β 1 = 0.9, β 2 = 0.999) (Loshchilov and Hutter, 2017).We use λ = 5, inference network dropout of 0.2, and no dropout after the RL policy (policy dropout).All other experiments use these same settings unless otherwise noted.

Comparison to Other Topic Models
We compare our method to recent topic models found in the literature.

Benchmarking Neural Topic Models (BNTM)
In the beginning, our approach is compared with all models evaluated by Doan and Hoang (2021).We use their preprocessed documents and replicate their results using K = 10 to calculate topic coherence.Following the authors, we sweep from 0.5*N topics to 3*N topics in intervals of 0.5*N (N being the "correct" number of topics for each data set).
Next, we do a hyperparameter sweep over λ of 1, 3, 5, and 10. Results are averaged over ten random seeds and shown in Figure 5.

Topic Modeling in Embedding Spaces
Next, the comparison is done with Dieng et al.
(2020) on the New York Times data set with 300 topics and without using stop words.Results are shown in Table 1.We increase batch size to 32768 and only train for 20 epochs on one random seed.Additionally, we increase the number of units in each layer of the inference network to 512, increase dropout in the inference network to 0.5, and decrease λ to 1. Topic diversity is calculated using K = 25.

Pre-training is a Hot Topic (PTHT)
We also compare our model, using all metrics, with the best model as evaluated by Bianchi et al. (2020a).Results are shown in To show the tradeoff between topic diversity and coherence, we perform a sweep over policy dropout from 0 to 0.9 at intervals of 0.1 using the 20 Newsgroups data set with 50 topics.Other hyperparameters are kept the same.We train for 2000 epochs.Results are averaged over 30 random seeds and shown in Figure 6.

Ablation Study
To provide empirical evidence that performance improvements come from the RL policy formulation, we do a study ablating relevant changes from the final RL model down to the original ProdLDA model.All comparisons are performed on the 20 Newsgroups data set with 20 topics and use the same settings as subsection 5.1.Results are averaged over 30 random seeds and shown in Table 5.

Discussion
For the initial experiments on the 20 Newsgroups data set, the average loss (Figure 2) reaches a near plateau around the 200th epoch.Past this epoch, coherence (Figure 3) continues to increase slowly, and topic diversity (Figure 4) increases substantially until around the 400th epoch, past which it also continues to increase slowly.It shows that training beyond a plateau in loss can still improve NTM performance.
Compared to Doan and Hoang (2021), the RL model performs on par with or better than other models across all four data sets, while the performance of other models varies greatly between data sets.On the Snippets, 20 Newsgroups, and W2E-  2020) is a more useful metric than inverse RBO, as it usually has a higher variance in values and is more intuitive to understand.For Word2Vec coherence, the RL model performs on par with the best of the other models, except when compared to ETM (Dieng et al., 2020)  Topic diversity and coherence values should be provided when reporting topic model performance.In Figure 6, the highest topic quality is achieved when there is no policy dropout.Topic diversity can be sacrificed for some gain in coherence.Applications of topic models may want to maximize topic diversity, coherence, or both.The description of topic model performance should reflect this.
In the ablation study, removing the RL policy formulation causes the model to perform worse than the original one.It confirms RL policy augments the improvements from other changes to the model.Performance suffers the most when the softmax distribution is re-added to the topic distribution during training.To recapture the softmax distribution of topics, it can be applied to the topic distribution during inference.Adding policy dropout significantly reduces topic diversity and leads to a slight coherence reduction.Performance improves with SBERT embeddings, and the model can still recon-struct the BoW within the ELBO without direct access.Increasing λ to 5 improves performance, but as seen from other experiments, this is only sometimes the case.

Conclusion
Inspired by the introduction of probabilistic inference techniques to RL, we take the approach to develop a NTM augmented with RL.Our model builds on the ProdLDA model, which uses a product of experts instead of the mixture model used in classical LDA.We improve ProdLDA by adding SBERT embeddings, an RL policy formulation, a weighted ELBO loss, and the improved NN architecture.In addition, we track topic diversity and coherence during a training process rather than only evaluating these metrics for the final model.Our fully unsupervised RL model outperforms most other topic models.It is only topped by contrastive Scholar -a method using supervised labels during training -in a few select cases.

Limitations
The main limitation identified for our RL model is decreased performance as the vocabulary size increases.Our RL model also has a higher variance than some other topic models to which we compared.While our RL model performed well on all the data sets tested, this performance may not  7.

A.1 20 Newsgroups
The 20 Newsgroups data set (Lang, 1995) consists of around 19,000 newsgroup posts from 20 topics.We perform experiments on this data set with three different preprocessing methods.For our initial experiments, we follow the preprocessing in Bianchi et al. (2020a) and additionally remove all words with less than 3 letters.For the comparisons with Bianchi et al. (2020a) and Nguyen and Luu (2021), we follow the preprocessing in Bianchi et al. (2020a).For the comparison with Doan and Hoang (2021), we use their already preprocessed data set.

A.2 New York Times
The New York Times data set (Sandhaus, 2008) consists of over 1.8 million articles written by the New York Times between 1987 and 2007.We follow the preprocessing from Bianchi et al. (2020a), but do not remove stopwords.

A.3 Snippets
The Web Snippets data set (Ueda and Saito, 2002) consists of around 12,000 snippets of text from websites linked on "yahoo.com".The snippets are grouped into 8 domains.We use the already preprocessed data set from Doan and Hoang (2021).

A.4 W2E
The W2E data set (Hoang et al., 2018) consists of news articles from media channels around the world.The W2E-title subset is the titles from the news articles, while the W2E-content subset is the text content of the articles.The articles are grouped into 30 topics.We use the already preprocessed data set from Doan and Hoang (2021).

A.5 Wiki20K
The Wiki20K data set (Bianchi et al., 2020b) consists of 20,000 English Wikipedia abstracts randomly sampled from DBpedia.We follow the preprocessing from Bianchi et al. (2020a).

A.6 StackOverflow
The StackOverflow data set (Qiang et al., 2020) consists of around 16,000 question titles randomly sampled from 20 different tags in a larger data set crawled from the website "stackoverflow.com"between July and August 2012.We use the already preprocessed data set from Qiang et al. (2020).

A.7 Google News
The Google News data set (Qiang et al., 2020) consists of around 11,000 titles and short samples from Google News articles clustered into 152 groups.
We use the already preprocessed data set from Qiang et al. (2020).

A.8 Tweets2011
The Tweets2011 data set (Qiang et al., 2020) consists of around 2,500 tweets in 89 clusters sampled from the larger Tweets2011 corpus (McCreadie et al., 2012) crawled from Twitter between January and February 2011.We use the already preprocessed data set from Qiang et al. (2020).

A.10 Wikitext-103
The Wikitext-103 data set (Merity et al., 2016) consists of around 28,500 Wikipedia articles classified as either Featured articles or Good articles by Wikipedia editors.We follow the preprocessing from Bianchi et al. (2020a).

B Evaluation Metrics
We track topic diversity, coherence, perplexity, and loss for the training and test sets if applicable.Topic diversity and coherence are calculated based on the top-K words in each topic, with K noted for each experiment.We use NPMI coherence with co-occurence based on full document windows.
Most previous NTMs have only reported the coherence of the final model, presumably because coherence is not tracked during training for computational reasons.To enable tracking of coherence during training, we modify a vectorized implementation of UMass coherence8 to calculate NPMI coherence and add caching for further speed-up.We also implement a GPU-optimized algorithm to calculate topic diversity during training.
Tracking these metrics during training provides two main benefits.The first benefit is that if training is going poorly, it can be terminated.Poor training could be caused by component collapse (low topic diversity), or if the model is unable to fit to coherent topics (low coherence).The second benefit is enabling deeper performance comparisons between models and between training runs for a single model.Most existing NTMs only track loss and perplexity during training, so additionally tracking topic diversity and coherence could provide additional insights on model performance.

C.1 Topic Words from Initial Experiments
We choose one example of the top 10 words for all 20 topics from the initial experiments on the 20 Newsgroups data set.We choose the seed with the 15th highest coherence (out of 30 seeds).Topic words are shown in Table 8.Each document in the Twenty Newsgroups data set is labeled as belonging to one of 20 categories.These 20 categories are shown in Table 9.

C.2 Pre-training is a Hot Topic
We show a further comparison between the contextual embedding model from Bianchi et al. (2020a) and our RL model in Table 10.Average NPMI coherence over 30 seeds is compared for each number of topics: 25, 50, 75, 100, and 150.

C.3 Hyperparameters
We show the hyperparameters for each experiment we performed.Experiment seeds are generated with a meta-seed for reproducibility.The metaseed is randomly chosen from integers between 0 and 2 32

C.3.1 Initial Experiments and Ablation Study
We use the same meta-seed for the ablation study as we did for the initial experiments.Hyperparameters for the initial experiments can be found in Table 11.Further tables for all experiments will only show hyperparameters that differ from this table .Hyperparameters for the ablation study can be found in Table 12.

C.3.2 Benchmarking Neural Topic Models
We show hyperparameters for the comparison with Doan and Hoang (2021).Hyperparameters for Snippets can be found in Table 15.20 Newsgroups in Table 16.W2E-title in Table 17.W2E-content in Table 19.

C.3.3 Topic Modeling in Embedding Spaces
Hyperparameters for the comparison with Dieng et al. ( 2020) can be found in Table 20.

C.3.4 Pre-training is a Hot Topic
We show hyperparameters for the comparison with Bianchi et al. (2020a).Data set and seed information can be found in Table 13.All other hyperparameters are the same for each data set; these can be found in Table 18.

C.3.5 Contrastive Learning for NTM
We show hyperparameters for the comparison with Nguyen and Luu (2021).Some hyperparameters are already shown in Table 3 and won't be shown again here.Data set and seed information can be found in Table 14.Other hyperparameters are the same for each data set; these can be found in Table 21.Hyperparameters for the policy dropout sweep can be found in Table 22.

C.4 Ablation Study
We show full results from the ablation study in Table 23.

D Model Parameter Count
The number of parameters (P) in the model differs based on the total number of parameters across all inference layers (L), the number of topics (N), and the vocabulary size (V).Trainable parameters are the inference layers, the prior distribution of topics (N x 1), and the distribution of words over topics (V * N).Total parameters can be calculated with Equation 8.
The largest model we use is for the Wikitext-103 data set with 200 topics.This model has 4,001,224 parameters.

E Future Work
We have identified some possible paths for future work.The SBERT embeddings could be fine-tuned during training rather than calculating them during pre-processing and freezing them during train- D2.Did you report information about how you recruited (e.g., crowdsourcing platform, students) and paid participants, and discuss if such payment is adequate given the participants' demographic (e.g., country of residence)?Not applicable.Left blank.
D3. Did you discuss whether and how consent was obtained from people whose data you're using/curating?For example, if you collected data via crowdsourcing, did your instructions to crowdworkers explain how the data would be used?Not applicable.Left blank.
D4. Was the data collection protocol approved (or determined exempt) by an ethics review board?Not applicable.Left blank.
D5. Did you report the basic demographic and geographic characteristics of the annotator population that is the source of the data?Not applicable.Left blank.

Figure 1 :
Figure 1: Architecture Diagram: gray boxes -processing; white boxes -models/data/information; arrows across boxes -tune-ability Diversity is another metric for evaluating topic models.It measures the uniqueness of the top-K words across all topics.Dieng et al. (2020) use K = 25 for reporting topic diversity.topic-diversity = number-of -unique-words K * number-of -topics (3) Topic Quality is a topic modeling metric introduced by Dieng et al. (2020).

Figure 5 :
Figure 5: Comparison of RL model (ours) to BNTM models

Table 2
. Metrics are averaged over 25, 50, 75, 100, and 150 topics: 30 seeds for each number of topics.We use the same preprocessing as the authors.We use λ = 1.

Table 1 :
Nguyen and Luu (2021) words dataWe compare results with the contrastive Scholar model fromNguyen and Luu (2021).For each data set we perform a hyperparameter search with 50 topics.Search ranges and best results for each data set are shown in Table3.We use the best hyperparameters from this search for final training runs with 50 and 200 topics.We train for 2000 epochs.Results are averaged over 30 random seeds and shown in Table4.

Table 3 :
Hyperparameter search and best results per data set for RL model We evaluate models on the test set where available, and on the training set if there is no test set.Coherence and diversity for the training and test set are the same, as they are evaluated on the word distribution over topics which doesn't change per document.In the code, training coherence and diversity are computed after each batch, while test coherence and diversity are computed after each epoch.Number of training/test documents and vocabulary sizes are shown in Table 6.Average original and preprocessed training document lengths are shown in Table 7. All other data sets are obtained from the recent literature.No sensitive information is used or inferred in this paper.The risk of harm from our model is low.Any artifacts in this paper are used following their intended use cases.
consists of 50,000 movie reviews, each with

Table 6 :
Data Sets -Documents and Vocabularies . Values in {curly brackets} indicate a

Table 23 :
Full Results from Ablation StudyC Did you run computational experiments?Experiments are outlined in Section 5 (on pages 5 & 6).Some figures/tables extend to pages 7-9.C1.Did you report the number of parameters in the models used, the total computational budget (e.g., GPU hours), and computing infrastructure used?Number of parameters is reported in Appendix D (on page 14).Computing infrastructure and estimates of total computational budget are reported in Section 8 (on pages8 & 9)C2.Did you discuss the experimental setup, including hyperparameter search and best-found hyper- parameter values?Experimental setup is outlined in Section 5 (on pages 5 & 6).Reproducibility steps and hyperparameters are provided in Appendix D (on pages 13-18) C3.Did you report descriptive statistics about your results (e.g., error bars around results, summary statistics from sets of experiments), and is it transparent whether you are reporting the max, mean, etc. or just a single run?We provide error bars on graphs where applicable, mention the number of runs/seeds for each experiment, and state that mean results and confidence intervals are used.C4.If you used existing packages (e.g., for preprocessing, for normalization, or for evaluation), did you report the implementation, model, and parameter settings used (e.g., NLTK, Spacy, ROUGE, etc.)?This can be found in the open-sourced code.The code can be found at: https://github.com/jeremycostello/rl-for-topic-modelsD Did you use human annotators (e.g., crowdworkers) or research with human participants?D1.Did you report the full text of instructions given to participants, including e.g., screenshots, disclaimers of any risks to participants or annotators, etc.?Not applicable.Left blank.