Learning Interpretable Style Embeddings via Prompting LLMs

Style representation learning builds content-independent representations of author style in text. Stylometry, the analysis of style in text, is often performed by expert forensic linguists and no large dataset of stylometric annotations exists for training. Current style representation learning uses neural methods to disentangle style from content to create style vectors, however, these approaches result in uninterpretable representations, complicating their usage in downstream applications like authorship attribution where auditing and explainability is critical. In this work, we use prompting to perform stylometry on a large number of texts to create a synthetic dataset and train human-interpretable style representations we call LISA embeddings. We release our synthetic stylometry dataset and our interpretable style models as resources.


Introduction
Style representation learning aims to represent the stylistic attributes of an authored text.Prior work has treated the style of a text as separable from the content.Stylistic attributes have included, but are not limited to, linguistic choices in syntax, grammar, spelling, vocabulary, and punctuation (Jafaritazehjani et al., 2020).Style representations should represent two texts with similar stylistic attributes more closely than texts with different attributes independent of what content is present in the texts.
Stylometry, the analysis of style, applies forensic linguistics to tasks like authorship attribution.Stylometry often relies on semi-manual analysis by forensic linguistic experts (Mosteller and Wallace, 1963;Holmes, 1994;Rosso et al., 2016).Computational stylometry often uses rule-based approaches utilizing count-based features like the frequencies of function words (Stamatatos, 2009;Koppel et al., Figure 1: An example of a 768-dimensional interpretable style vector produced by LISA, trained using a GPT-3 annotated synthetic stylometery dataset. 2009; Tausczik and Pennebaker, 2010).More modern, neural approaches attempt to learn style representations in an unsupervised fashion through a proxy task like style transfer (Shen et al., 2017;Fu et al., 2018;John et al., 2019;Dai et al., 2019;Li et al., 2019;Yi et al., 2021;Zhu et al., 2022) or authorship verification (Boenninghoff et al., 2019;Hay et al., 2020;Zhu and Jurgens, 2021;Wegmann et al., 2022).These stronger neural approaches, unlike simpler frequency-based techniques, are uninterpretable.This makes it difficult to effectively analyze their representations and their failure modes, and precludes their usage in real-world authorship attribution scenarios because interpretability and verification is critical for legal admissibility (Tiersma and Solan, 2002).
With this motivation, we propose a humaninterpretable style representation model M which, for a given text t, produces a D-dimensional vector M(t) ∈ [0.0, 1.0] D .Each dimension corresponds to one of D style attributes {a 0 , a 1 , . . ., a D }.Each element at dimension d of this vector is constrained in the range [0.0, 1.0] to represent the probability of the corresponding style attribute a d being present in the text t.See Figure 1 for a visualization of a result from our final trained model with D = 768 dimensions.An immediate obstacle to train such a model is that no large dataset of texts with stylometric annotations currently exists; annotating a large number of texts on a wide variety (D = 768) of stylistic attributes would likely require annotators with linguistic expertise and be prohibitively expensive.Given this, we use GPT-3 (Brown et al., 2020), a large language model (LLM), and zeroshot prompts to generate a synthetic dataset we call STYLEGENOME of human-interpretable stylometric annotations for various texts.Our approach is motivated by recent works showing models trained on synthetic datasets annotated by prompting LLMs can match and sometimes even outperform models trained on human-labeled datasets (Wang et al., 2022;Gilardi et al., 2023;Huang et al., 2022;Honovich et al., 2022).Training on STYLEGENOME, we develop the Linguistically-Interpretable Style Attribute (LISA) embedding model.We summarize our primary contributions: 1. We outline an unsupervised method for producing interpretable style embeddings using zero-shot prompting and distillation.
2. We generate and release STYLEGENOME, a synthetic stylometry dataset with ~5.5M examples, the first large-scale dataset with texts paired with wide range of stylometric annotations.
3. We train, evaluate, and release LISA, the first-ever interpretable style embedding model with a wide variety of linguistic dimensions (D = 768).We find LISA matches the performance of existing style representations, while allowing for explainability and transparency.

Generating STYLEGENOME
To create STYLEGENOME for training LISA, we select Reddit data from the Million User Dataset (MUD) (Khan et al., 2021;Andrews and Bishop, 2019) to stylometrically annotate following prior work that uses Reddit to source a diversity of styles from different authors (Wegmann et al., 2022).We sample 10 random posts per author for 1,000 random authors, resulting in 10,000 total posts selected for annotation.We display some of the diversity of styles captured in the following examples from our Reddit authors.They vary in formality, punctuation, emoji usage, etc.: We use GPT-31 (Brown et al., 2020;Ouyang et al., 2022) to annotate each post in two stages.
Stage 1 We elicit the model with a zero-shot prompt to generate a description of the style of a given Reddit post: Prompt: Write a long paragraph describing the unique grammar style of the following passage without referring to specifics about the topic.
Passage: Wow that's really cool.I'll have to watch for that next time I watch it. Description:

↓
Generation: The passage reflects a conversational style in its grammar, with short, simple sentences that are easy to understand.The language is informal and direct, with the speaker expressing their enthusiasm for the topic in a straightforward manner.The speaker also uses contractions, such as "I'll," to emphasize their excitement.The speaker also uses a casual tone, with the phrase "really cool" to emphasize their interest in the topic.The speaker also uses the present tense to express their anticipation for the future, with the phrase "next time I watch it."This style of grammar is effective in conveying the speaker's enthusiasm and interest in the topic.
Stage 2 We use another zero-shot prompt to standardize the generations from Stage 1 into short, declarative sentences with a uniform structure: Prompt: Here's a description of an author's writing style for a passage: The passage utilizes a conversational style of grammar, with short...
Rewrite this description as a long list of short sentences describing the author's writing style where each sentence is in the format of "The author is X." or "The author uses X.". Output:

↓
Generation: The author is using a conversational style of grammar.The author is using short, simple sentences.The author is using language that is informal and direct.The author is expressing enthusiasm for the topic in a straightforward manner.The author is using contractions, such as "I'll".The author is using a casual tone.
The author is emphasizing their interest in the topic with the phrase "really cool".The author is using the present tense to express anticipation for the future.
We run both stages with 93 different Stage 1 prompts for all 10,000 posts.The remaining 87 prompts target narrow and specific dimensions of style: 1. Write a description of whether the author of the following passage has any figurative language ...

2.
Write a description of whether the author of the following passage has any swear words ...

3.
Write a description of whether the author of the following passage has any repeated words ... The 87 targeted prompts are derived from surveys of stylometry literature, and they cover all of (Tausczik and Pennebaker, 2010)'s linguistic and psychological categories.See Appendix A.2 for more details.We report the results of an ablation experiment between the two Stage 1 prompt categories in Appendix C. Appendix D details dataset annotation costs.
STYLEGENOME The output of Stage 2 is sentence tokenized3 and filtered to keep only sentences beginning with "The author".We refer to these sentences as human-interpretable style attributes.Our method annotates the texts with nearly 1.3M style attributes.These style attributes are represented in natural language so "The author creates a conversational tone" and "The author has a conversational tone" are counted separately in the raw dataset.Our training procedure in Section 3.1 is able to train directly on these natural language style attributes, obviating a normalization step.Some annotations may be hallucinated resulting in a noisy dataset, but we choose to train on the full synthetic dataset, without manual intervention, to maintain an unsupervised procedure following prior work (Wang et al., 2022).We hypothesize our model will find signal in the noise, which we evaluate in Section 4. The final dataset statistics can be found in Table 1.

Method
We first distill stylometric annotation knowledge from GPT-3 into a Style Feature Agreement Model (SFAM).Given a text t and a style attribute a as input, SFAM(t, a) produces an agreement score between 0.0 and 1.0 representing the probability of the style attribute being present in the text.By selecting a set of D style attributes {a 0 , a 1 , . . ., a D }, we can use SFAM to construct LISA, our interpretable style representation model that produces D-dimensional vectors: The Euclidean distance between style vectors for two texts ∥M LISA (t 2 ) − M LISA (t 1 )∥ 2 would not be particularly meaningful.We can multiply a trained weight vector w or a weight matrix W to the style vectors, that act as simple interpretable embedding layers.This operation would make the Euclidean distance more meaningful, for example ∥M LISA (t 2 ) * w − M LISA (t 1 ) * w∥ 2 .We call the result of a LISA style vector multiplied by w or W a LISA style embedding.We discuss training in detail next, leaving hyperparameter and implementation specifics in Appendix E.

SFAM
We use distillation (Ba and Caruana, 2014) to teach the stylometric annotation capabilities of GPT-3 to EncT54 (Liu et al., 2021;Raffel et al., 2020), a smaller, more efficient student model.Table 2: Correlation of agreement scores produced by SFAM against human judgments on texts over a wide variety linguistic and authorship dimensions.The natural language style attributes used as input to SFAM when producing the agreement scores for each dataset are also provided.

Sampling Batches
We train EncT5 with a binary classifier head on randomly sampled batches of examples (x i , y i ) where each batch contains an equal number of positive (y i = 1) and negative (y i = 0) examples.The input x i consists of a style attribute a and an author's text t concatenated in a string "{{a}}|||{{t}}", for example x i = "The author is using a positive tone.|||You got this ;)".Labeled pairs from STYLEGENOME are sampled as positive examples such that each style attribute is sampled with equal probability.For each positive example, we perform negative sampling and retrieve a negative example text where the positive example's style attribute is likely not present.To do this, we find the 10,000 most dissimilar style attributes to the positive example's style attribute with SBERT5 similarity.We select a text that is positively labeled with a randomly selected dissimilar style attribute as the negative example text.

Training and Inference
Training over the ~1.3M unique style attributes in STYLEGENOME, our training dataset for SFAM is effectively a multitask mixture.Style attributes presented in natural language to the model allows the pre-trained T5 encoder to jointly learn between style attributes and generalize to unseen style attributes using the semantic information in those natural language descriptions.This is especially desirable since some style attributes only have a handful of text examples, while others may have thousands.This setup resembles the multitask mixture trained on in Raffel et al. (2020).To validate training, we hold-out 50 random style attributes that have between 30-50 examples each as a validation set.We validate learning during training by measuring the ability of SFAM to generalize and produce accurate agreement scores for the unseen style attributes.At inference, we softmax the binary class logits to interpret them as probabilities and we take the probability of y i = 1 as the agreement score.We also study the effect of the size of STYLEGENOME on performance and find that as the synthetic dataset grows, validation performance improves and SFAM generalizes to better predict agreement scores for unseen style attribute and text pairs (see Appendix B).

LISA Style Vectors
As discussed earlier, SFAM is directly used to produce the LISA interpretable style vectors.We arbitrarily choose D = 768 in this work, following the dimensionality of prior style vectors and BERT (Devlin et al., 2019).We now detail how we select the style attributes associated with each dimension {a 0 , a 1 , . . ., a 768 } with little manual in- 13 POT MORDE BABY WOOOOOOOOOOOOOOOO 1. 1.00 -The author is using an elongated word.
2. 1.00 -The author is using only English words.
3. 1.00 -The author is using a single word.4. 1.00 -The author is using swear words.5. 1.00 -The author uses two exclamation marks.
No wonder everyone resorts to performing a murder spree eventually.
Podcast originally refers to an iPod, and before that there was definitely TWiT, which still calls itself a Netcast 1. 0.97 -The author uses a variety of words to describe the same concept.2. 0.97 -The author is simply describing a product.* 3. 0.95 -The author uses specific terms related to the topic.4. 0.94 -The author has a deep understanding of the topic.5. 0.94 -The author is using words focusing on the past.
Table 3: The five highest scoring dimensions from the 768-dimensional LISA vector produced on various Reddit texts.The interpretable style attribute corresponding to each dimension is displayed along with the score.We manually inspect the top style attributes and annotate them as reasonable, plausible, or incorrect.Attributes annotated with * blur the line between style and content.Error analysis can be found in Section 4.1.
tervention.The first 87 attributes {a 0 , a 1 , . . ., a 86 } directly correspond to the features of our 87 targeted prompts in the form of "The author is using {{targeted_feature}}".The remaining 681 are downselected from the ~1.3M style attributes with filtering heuristics, choosing those that appear for at least 10 authors, but no more than 600 authors (attempting to select for frequent, yet discriminative attributes).Once a style attribute is selected to be part of the 768, we do not select another style attribute with SBERT cosine similarity > 0.8 to largely avoid near-duplicates.We also reject style attributes for selection that are undesirable for interpretability. 6Examples of LISA can be found in Figure 1 and Table 3.With 768 dimensions, producing a single LISA vector would require 768 inferences of SFAM, a computationally expensive operation.To address this, we produce the LISA representations for 1,000,000 random Reddit posts from MUD.We then distill into a new EncT5 model with 768 regression labels.We hold-out 10,000 examples as a validation set.After distillation to the dedicated model, the 768-dimensional style vector can be produced in a single forward pass with minimal degradation (validation MSE = 0.005).

LISA Style Embeddings
We experiment with two different simple and interpretable embedding layers, a weight vector (w 768 ) and a weight matrix (W 768×64 ).We attach these on top of the LISA model and train just the layer using a contrastive learning objective and triplet loss (Khosla et al., 2020;Schroff et al., 2015).We also experiment with two different authorship datasets from prior works to train the embedding layer; we refer to these datasets as the Wegmann dataset (Wegmann et al., 2022) and the LUAR dataset (Rivera-Soto et al., 2021).Like the prior work, we assume an author has consistent style between their different texts.Given some anchor text by an author, we use another text by the same author as a positive example, and text by a different author as a negative example for our triplets.This objective minimizes the distance between two texts by the same author and maximizes the distance between texts by different authors, learning a meaningful metric.

Results
We first evaluate to what degree SFAM, which is ultimately used to build LISA representations, learns useful stylometric annotation capabilities that align with human reviewers.We then evaluate LISA itself on STEL, a framework purpose-built for evaluating the quality of style measures (Wegmann and Nguyen, 2021).All evaluations were completed after the collection of the STYLEGENOME dataset.
Correlation to Human Judgments We conduct a broad set of 55 studies across 21 datasets in 7 distinct categories of linguistic style and authorship dimensions in Table 2.We measure the correlation of SFAM's agreement scores to human judgments.SFAM performs stronger on dimensions like formality, sentiment, and emotion than dimensions like linguistic acceptability.This is likely an artifact of the effectiveness of GPT-3 in annotating these categories, an expected result given prior work has shown language models struggle with identifying these features (Warstadt et al., 2020).Interestingly, SFAM demonstrates some limited ability to perform authorship profiling, a task adjacent to stylometry.The ability to probe SFAM in an interpretable manner helps identify which categories of features it can reliably represent, whereas prior approaches were more opaque.Overall, the Table 2 results demonstrate SFAM's annotations do correlate with human judgments on some important dimensions of style.We hypothesize future research with larger datasets (> 10,000 posts), more diverse sources of texts, and larger and more performant LLMs may further broaden and improve learned stylometric annotation capabilities.
STEL In Table 4, we provide the results of evaluating LISA using STEL.The STEL task evaluates whether two texts with similar styles can be matched using the distance/similarity metric defined by a style representation.We compare with other content-independent style representations, or methods that explicitly limit representation of content in favor of style.LISA explicitly limits the representation of content through the 768 stylefocused attributes that act as a bottleneck.Contentaware representations like SBERT, on the other hand, have direct access to the text and may be able to represent the content in the text to an extreme degree, representing the usage of a specific rare

Text Style Attribute
Subscribed.Interesting idea.I would like to see some advanced stats on hitting percentages to different locations on the court.For example, having the court broken up into maybe 12 zones and then hitting percentages from each position to those zones.I remember seeing an article that did this years ago and I have never been able to find anything online.I said recently on this sub that the deep angle shot from the left side or right side was the highest percentage shot in volleyball, but I was not able to back up my claim with any sources or anything.Anyways, I am a VB nerd, no doubt.Interested to see what direction you take this.Cheers!
The author is being polite.
Yeah I also work in QA, and seeing this kind of stuff get released is maddening.About a year ago working on a new platform we were seeing bugs in the hundreds each week, we pushed back the release of the product 3 months because basically it didn't work.If it was up to the devs, they'd have released it on time, because the stuff they'd written code for worked.Thorough doesn't even cover the work we go through every 3 months, and Niantic's approach seems completely amateur from this side.They're putting bandaids on problems and hiding things like the 3 step problem behind curtains without seemingly fixing anything, although I do have to say their balance tweaks to battling have been a big step in the right direction.
The author is using a personal anecdote to illustrate their point.
Thank you.I'd be interested in reading more about your experiences, in addition to the "American Wedding" story.Are you watching the stream?I wish there was a way to find out how many people in the world are watching it.The music is lovely, huh?God damn.He's got his bunny Fair Isle sweater on, drinking Dunkin' Donuts coffee.I would have thought him a Starbucks man.:-) The author is using an emoji.
Table 5: Sentence-level LISA vectors over each sentence from a longer passage of text can help identify and quantify which sentences contribute to overall style attributes scored on the longer passage providing granular interpretability.
word or discussion of a specific concept.We provide the results of content-aware representations simply for reference.We find LISA embeddings are able to closely match (and on average slightly outperform) prior style representations on STEL while providing interpretability.
Sentence-Level Interpretability In Table 5, we demonstrate how visualizing a dimension of sentence-level LISA vectors can help explain which sentences contribute to a dimension activated on a passage-level LISA vector.
Forensic Interpretability Typically for authorship attribution tasks, content-aware representations that capture both content and style are used to make a determination.Author style, however, is still an important component in determining attribution (Rivera-Soto et al., 2021).Offering a clear explanation and presenting supporting evidence is crucial, particularly in the context of forensic analysis, such as when presenting evidence in a court trial.Explainability has often been overlooked in neural approaches to authorship attribution tasks.
To motivate this as a future research direction using our interpretable stylometric representations and our general approach, we provide an example of explanations on the Contrastive Authorship Verifi-cation task from Wegmann et al. (2022) in Table 6 with LISA (LUAR + W ). Further examples and discussion on how the top common and distinct style attributes are ranked can be found in Appendix G.

Error Analysis
We highlight insights and observations around common failure modes of our technique in this section.We annotate the common failure modes with their percentage rate of occurrence.7 Content vs. Style Attributes (3%) It is unclear whether style and content can truly be separated as some content features are important for style or profiling an author (Jafaritazehjani et al., 2020;Bischoff et al., 2020;Patel et al., 2022).Even after filtering, 3% of dimensions of LISA still represent content.For example, "The author is using words related to the game they are discussing".However, while LISA may have the ability to represent that two texts are both discussing the topic of video games, it does not have the direct ability a contentaware approach would of representing which specific video game is being discussed, due to the Texts Top 3 Common/Distinct Style Attributes Anchor: Devices that use two pronged instead of three pronged plugs are required to meet certain safe design requirements.Among other things, if a device has a switch, the switched line MUST BE hot, not neutral.The polarized plugs make sure that the right prong/wire is hot.This is why devices that have no switches (primarily wall warts) need not have polarized plugs.Same Author: Your diaphragm would be trying to contract against the air pressure in your lungs.That's why deep sea diving requires regulators, to match the pressure of the air supply to the pressure surrounding your rib cage.You can breathe against a maximum of about 1/2 PSI, which is not enough pressure to adequately oxygenate your blood.
(0.89, 1.00) -The author is using a scientific approach.(0.96, 0.98) -The author is using a combination of technical terms and everyday language.(0.91, 0.84) -The author is using formal and professional language.
Different Author: That's great!I'm glad it seems to be finding its' niche.Now if they could just make a Star Wars version of this game, I'd happily swallow that fat learning curve and overcome my frustrations with the combat system.;) (0.06, 0.99) -The author is using words related to the game they are discussing.* (0.00, 0.88) -The author is using an emoji.(0.02, 0.87) -The author uses an emoticon at the end.
Table 6: Example interpretable explanations on the Contrastive Authorship Verification task.The top style attributes in common between the Anchor text and a text by the Same Author are shown.The top distinct style attributes between the Anchor text and a text by a Different Author are also shown.The scores of each style attribute against the texts is shown in (•, •/•).Attributes annotated with * blur the line between style and content.Error analysis can be found in Section 4.1.Further examples and details on style attribute ranking can be found in Appendix G. limited set of 768 features that act as a bottleneck.Our approach also allows visibility into understanding how much of the representation derives from content-related features, while other neural representations are opaque and may use content-related features in a way that cannot be easily assessed.

Conflating Style Attributes with Content (2%)
For some style attributes, LISA conflates the content of text with the presence of the style attribute.For example, "The author is cautious", may have a high agreement score on any text containing the word "caution" even if the author is not actually expressing caution in the text.
Spurious Correlations (6%) For other style attributes, LISA has learned spurious correlations.For example, "The author uses two exclamation marks", often has a high agreement score on any text that is exclamatory in nature, but does not actually use exclamation marks.An example can be found in Table 3.
Fundamental Errors (10%) LISA sometimes produces a high agreement score for text displaying the polar opposite of a style attribute or produces a high agreement score for an attribute that simply is not present in the text.Table 3 demonstrates some of these incorrect examples.Inspecting our dataset, this error happens both due to EncT5's internal representations likely aligning on relatedness instead of similarity (Hill et al., 2015) and due to hallucination and annotation errors by GPT-3.Hallucinated generations is a common issue with any LLM-guided approach and we discuss it further in Limitations along with potential future mitigations.

Conclusion
In this work, we propose a promising novel approach to learning interpretable style representations.To overcome a lack of stylometrically annotated training data, we use a LLM to generate STYLEGENOME, a synthetic stylometry dataset.Our approach distills the stylometric knowledge from STYLEGENOME into two models, SFAM and LISA.We find that these models learn style representations that match the performance of recent direct neural approaches and introduce interpretability grounded in explanations that correlate with human judgments.Our approach builds towards a research direction focused on making style representations more useful for downstream applications where such properties are desirable such as in a forensic analysis context.Future directions that introduce human-in-the-loop supervised annotations or newer, larger, and better aligned LLMs for annotation have the potential to yield further gains in both performance and interpretability.
Model and Data Release We release our dataset (STYLEGENOME) and our two models (SFAM and LISA) to further research in author style.

Limitations and Broader Impacts
Limitations Handcrafted features by forensic linguists typically rely on frequency counts of word usage, usage of unique words or phrases, etc. (Mosteller and Wallace, 1963).The space of these kinds of features is non-enumerable and would not be well-represented with our technique that scores a fixed set of 768 interpretable features.Pure neural approaches may capture these kinds of features, but are non-interpretable and may capture undesirable content-related features.We explicitly trade-off the use of these kinds of features in this work to achieve interpretability.While we demonstrate our synthetic annotations are enough for a model to learn to identify stylistic properties in text in Table 2, they cannot be fully relied on yet for the reasons we discuss in Section 4.1.As large language models scale and improve, however, we believe this work could benefit from increasing coherency and decreasing hallucination in the annotations (Kaplan et al., 2020).STYLEGENOME is collected only on 10,000 English Reddit posts, however, larger datasets may improve performance as we show in Figure 2 and future research in multilingual LLMs may make it feasible to replicate this procedure for other languages.
Ethical considerations Style representations are useful for text style transfer (Riley et al., 2021) and in manipulating the output of machine generated text to match a user's style, for example, in machine translation (Niu et al., 2017;Rabinovich et al., 2017).While style transfer can be a useful benign commercial application of this work, superior style representations may aid the impersonation of authors.We demonstrate how style representations may aid legitimate cases of authorship attribution, a task that is typically done by forensic linguist experts.Our work introduces an interpretable approach, an important step in legitimizing the use of computational models for authorship attribution by providing explanations for predictions that can be audited and verified.

Diversity and inclusion
We believe style representations that capture wider dimensions of style can help aid in analyzing and representing minority writing styles in downstream applications like style transfer.

A.3 Standardization Prompt Templates
The descriptions of style generated from the prompts in Appendix A.1 and Appendix A.2 are substituted into the following standardization prompt: Here's a description of an author's writing style for a passage: {{description}} Rewrite this description as a long list of short sentences describing the author's writing style where each sentence is in the format of "The author is X." or "The author uses X.".

Output:
which transforms the verbose descriptions into short, declarative, uniform sentences beginning with "The author...," which are the final style attributes used in building the STYLEGENOME dataset that SFAM is trained on.

B Effect of STYLEGENOME Dataset Size
When training SFAM, we experiment with artificially limiting the size of the synthetic dataset, by limiting the number of authors in the dataset, to determine the effect of dataset size on the validation performance.In Figure 2, we find that as the synthetic dataset grows, validation performance improves and SFAM generalizes to better predict agreement scores for unseen style attribute and text pairs.

C Annotation Prompts Ablation
Prompts Used to Generate STYLEGENOME Validation F1 Open-ended Prompts 0.865 Targeted Prompts 0.898 Open-ended Prompts & Targeted Prompts 0.920

D STYLEGENOME Annotation Cost
Our inference cost with the OpenAI API was priced at $0.02 / 1K tokens, with a cost of ~$8 to annotate 10 Reddit posts by a single author with all of our prompts.Our full dataset of 1,000 authors cost ~$8,000 to annotate.

E Training Details E.1 SFAM
We use the EncT5 architecture (Liu et al., 2021) with a binary classifier in Hugging Face (Wolf et al., 2019).We randomly sample training batches of size 1,440 and use the AdamW optimizer with a learning rate of 0.001.We employ early stopping with a threshold of 0.01 on the validation set F1 metric and a patience of 50 batches.

E.2 LISA
We use the EncT5 architecture (Liu et al., 2021) with 768 regression labels in Hugging Face (Wolf et al., 2019) and use MSELoss.We randomly sample training batches of size 1,440 and use the AdamW optimizer with a learning rate of 0.001.We employ early stopping with a threshold of 1e-6 on the validation set MSE metric and a patience of 20 epochs.

E.3 LISA Embedding Layers
We experiment with two types of embedding layers w and W .We also experiment with two training datasets, the Wegmann dataset (Wegmann et al., 2022) and the LUAR dataset (Rivera-Soto et al., 2021).
For LUAR, we use the train split with 5% held out as validation and we sample random authors as negative examples.For Wegmann, we use the Conversation dataset train split and the dev split as validation.We use a margin of 1.0, a batch size of 32, an AdamW optimizer with a learning rate of 0.001, and employ early stopping with a threshold of 0.001 on the validation set loss.

F Full SFAM Evaluation Results
4. ... see all 87 targeted prompts in Appendix A.2

Figure 2 :
Figure 2: Best F1 achieved by SFAM on a held-out validation set of examples at various dataset sizes.

Table 1 :
Statistics for the STYLEGENOME dataset.

Table 4 :
Wegmann and Nguyen (2021)TEL-or-Content, an evaluation framework for style measures proposed byWegmann and Nguyen (2021)andWegmann et al. (2022)."LIWC"resultsare fromWegmann and Nguyen (2021)."LISA" is the 768-dimensional style vector."LISA (...)" uses LISA embeddings with the training dataset and embedding layer type denoted in (...).Gray indicates worse than random baseline performance on the adversarially challenging STEL-or-Content task.All approaches underperform on STEL-or-Content, but LISA approaches outperform or closely match existing style representation choices on STEL, while providing interpretability.

Table 7 :
Best F1 achieved by SFAM on a held-out validation set of examples with different sets of Stage 1 prompts used to annotate Reddit posts and generate the synthetic training data used during distillation.