Token Prediction as Implicit Classification to Identify LLM-Generated Text

This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation. Instead of adding an additional classification layer to a base LM, we reframe the classification task as a next-token prediction task and directly fine-tune the base LM to perform it. We utilize the Text-to-Text Transfer Transformer (T5) model as the backbone for our experiments. We compared our approach to the more direct approach of utilizing hidden states for classification. Evaluation shows the exceptional performance of our method in the text classification task, highlighting its simplicity and efficiency. Furthermore, interpretability studies on the features extracted by our model reveal its ability to differentiate distinctive writing styles among various LLMs even in the absence of an explicit classifier. We also collected a dataset named OpenLLMText, containing approximately 340k text samples from human and LLMs, including GPT3.5, PaLM, LLaMA, and GPT2.


Introduction
In recent years, generative LLMs have gained recognition for their impressive ability to produce coherent language across different domains.Consequently, detecting machine-generated text has become increasingly vital, especially when ensuring the authenticity of information is critical, such as legal proceedings.
Traditionally, techniques like logistic regression and support vector machines (SVM) have been used for detection tasks, as explained by Jawahar et al. (2020).The analysis of textual features like perplexity is proven to be effective as well (Wu et al., 2023).Recent advancements have introduced the use of language model itself to detect generated text, such as the AI Text Classifier released by OpenAI (2023) and Solaiman et al. (2019).
However, the exponential growth in the number of parameters from hundreds of millions to † Two authors contribute equally to this work.
⋆ The implementation of the classification model, training process, and dataset collection is publicly available on https: //github.com/MarkChenYutian/T5-Sentinel-publichundreds of billions has significantly improved the text generation quality, presenting an unprecedented challenge to the detection task.To overcome this challenge, we propose using the inherent next-token prediction capability of the base LM for detection task, aiming not just to determine whether or not the text is generated but also to identify its source.
2 Related Work

Generated Text Detection
Learning-based approaches to machine-generated text detection can be broadly classified into two categories: unsupervised learning and supervised learning.Unsupervised learning includes GLTR developed by Gehrmann et al. (2019) that uses linguistic features like top-k words to identify generated text.Another unsupervised approach, DetectGPT by Mitchell et al. (2023), employs a perturbationbased method by generating a modifications of the text via a pre-trained language model and then comparing log probabilities of original and perturbed samples.Supervised Learning includes GROVER (Zellers et al., 2020) that extracts the final hidden state and uses a linear layer for prediction in a discriminative setting.Energy-based models (Bakhtin et al., 2019) have also been investigated for discriminating between text from different sources.Solaiman et al. (2019) fine-tuned RoBERTa model on GPT-2 generated text, resulting in an accuracy of 91% on GPT2-1B in GPT2-Output dataset.

Text-to-Text Transfer Transformer
Text-to-Text Transfer Transformer (T5) (Raffel et al., 2020) has gained recognition due to its simplicity by converting all text-based language problems into a text-to-text format.Raffel et al. and Jiang et al. (2021) have shown that T5 out-performs BERT-based models on various natural language processing tasks.
However, prior approaches have not emphasized the use of T5 in the task of distinguishing the language model responsible for text generation.Furthermore, existing approaches have not directly leveraged the next-token prediction capability of the model for this particular task.Our approach advances the field by choosing T5 model as the base LM and using its next-token prediction capability to improve the accuracy and efficiency of distinguishing the origin of the text.

Data Collection
The dataset we collected, named OpenLLMText, consists of approximately 340,000 text samples from five sources: Human, GPT3.5 (Brown et al., 2020), PaLM (Chowdhery et al., 2022), LLaMA-7B (Touvron et al., 2023), and GPT2-1B (GPT2 extra large) (Radford et al., 2019).The OpenLLMText dataset, along with the response collected from OpenAI & GPTZero, is publicly available on Zenodo. 1  Human text samples are obtained from the OpenWebText dataset collected by Gokaslan and Cohen (2019).GPT2-1B text samples stem from GPT2-Output dataset released by OpenAI (2019).As for GPT3.5 and PaLM, the text samples are collected with prompt "Rephrase the following paragraph by paragraph: [Human_Sample]".But instructing LLaMA-7B to rephrase human text samples is ineffective, due to the lack of fine-tuning for instruction following of LLaMA-7B.Hence, we provided the first 75 tokens from the human samples as context to LLaMA-7B and obtained the text completion as the output.For further details, including the temperature and sampling method for each source, please refer to the table 3 in Appendix A.
We partitioned the OpenLLMText into train (76%), validation (12%) and test (12%) subsets.The detailed breakdown is listed in Table 4 in Appendix A.

Data Preprocessing
We noticed stylistic differences among different models.For instance, LLaMA generates \\n for newline character instead of \n as in other sources.To address the inconsistency, we followed a similar approach as Guo et al. (2023) to remove direct indicator strings and transliterate indicator characters.

Dataset Analysis
To avoid potential bias and shortcuts that can be learned by model unexpectedly, we analyzed

Method
Our approach can be formulated as follows.Let Σ represent the set of all tokens.The base LM can be interpreted as a function LM : Σ * × Σ → R. Given a string s ∈ Σ * and a token σ ∈ Σ, LM(s, σ) estimates the probability of the next token being σ.Let Y denote the set of labels in OpenLLMText, which contains "Human", "GPT-3.5",etc.We establish a bijection f : Y → Y, where Y ⊂ Σ acts as a proxy for the labels.By doing so, we reformulate the multi-class classification task Σ * → Y into a next-token prediction task Σ * → Y. Hence, the multi-class classification task can be solved directly using LM: 4.1 T5-Sentinel T5-Sentinel is the implementation of our approach using T5 model.Unlike previous learning-based approaches where final hidden states are extracted and passed through a separate classifier (Solaiman et al., 2019;Guo et al., 2023), T5-Sentinel directly relies on the capability of the T5 model to predict the conditional probability of next token.In other words, we train the weight and embedding of the T5 model and encode the classification problem into a sequence-to-sequence completion task as shown in Figure 1.We use reserved tokens that do not exist in the text dataset as Y.During fine-tuning, we use the

T5-Hidden
To evaluate the effectiveness of not using an additional classifier while accomplishing the same classification task, we also fine-tuned the T5 model with a classifier attached, denoted T5-Hidden.
As illustrated with Figure 2, the classifier in T5-Hidden uses the final hidden state from the decoder block of the T5 model and computes the probability for each label after taking a softmax over its output layer.T5-Hidden is trained under identical configuration as T5-Sentinel.

Evaluation
T5-Sentinel and T5-Hidden are evaluated on the test subset of OpenLLMText dataset with receiver operating characteristic (ROC) curve, area under ROC curve (AUC) and F1 score.

Multi-Class Classification
We breakdown the evaluation on multi-class classification, i.e., identify the specific LLM responsible for text generation, into one-vs-rest classification for each label.
As presented in Table 5, T5-Sentinel achieves a superior weighted F1 score of 0.931 compared

Human-LLMs Binary Classification
For generated text detection task, we compare T5-Sentinel against T5-Hidden and two widelyadopted baseline classifiers, the AI text detector by OpenAI and ZeroGPT.
Figure 6 displays the ROC curves obtained from our experiments and the detailed performance metrics such as AUC, accuracy, and F1 score are summarized in Table 1.Additionally, we compare the performance of each classifier on the generation text detection subtask for each LLM source in OpenLLMText, as shown in Table 2. Notably, T5-Sentinel outperforms the baseline across all subtasks in terms of AUC, accuracy, and F1 score.show that T5-Sentinel does not rely on unexpected shortcuts.Additionally, we employ t-distributed Stochastic Neighbor Embedding (t-SNE) projection (van der Maaten and Hinton, 2008) on the hidden states of the last decoder block of T5-Sentinel.The resulted t-SNE plot, shown in Figure 7, demonstrates the model's ability to distinguish textual contents from different sources, corroborating the evaluation results discussed earlier.For comparison, we also plotted the t-SNE plot of T5-Hidden on the test subset of OpenLLMText in figure 8, results show that the T5-Sentinel cannot distinguish LLaMA with other source of sample correctly.This aligns with the evaluation reported in table 2.

Dataset Ablation Study
Ablation study is conducted on the OpenLLMText dataset to further investigate which feature is utilized by the T5-Sentinel to make classification.We design 4 different cleaning configurations on OpenLLMText: i) compress consecutive newline characters to one; ii) transliterate Unicode characters to ASCII characters2 ; iii) remove all punctuation; iv) cast all characters to lower case.Evaluation results for each one-vs-rest binary classification task in Table 6 shows that T5-Sentinel is quite robust to perturbations in the input text.For ablation configuration i), ii) and iv), the AUC and F1 score are almost identical.However, the performance drop significantly under condition iii) (with ∆AUC ≈ −0.3).
To prove that T5-Sentinel is not overfitting to specific punctuation, we independently remove each punctuation in ASCII from input text and evaluated the performance of model on each onevs-rest classification task.Results show that only the removal of period and comma cause significant performance degradation (shown in Table 6).This can be due to the fact that T5-Sentinel is utilizing syntax structure of input sample to distinguish text from human, GPT3.5 and PaLM instead of overfitting on these two punctuation.In section 6.2, we confirm this hypothesis with an integrated gradient analysis.

Integrated Gradient Analysis
The integrated gradient method, proposed by Sundararajan et al. (2017), is a robust tool for attributing the prediction of a neural network to the input features.Here, we apply the integrated gradient method on the word embedding of input text sample and calculated the integrated gradient of each token using the following formula: (2) where x 0 denotes the word embedding of the input text same length as x but filled with <pad> token, which is considered as a baseline input.
The visualization tool we developed uses equation 2 with m = 100 to calculate the integrated gradient of each token and show the attribution of each token in the prediction made by T5-Sentinel model.Some samples for visualization can be found in appendix D.
We notice the existence of substantial gradients on non-punctuation tokens, especially on syntax structures like clauses (Sample 2, Appendix D) and semantic structures like repetitive verbs (Sample 4, Appendix D), indicating that the gradients are not exclusively overfitting on punctuation tokens.Rather, the drop in performance of the model without punctuation appears to stem from the fact that the removal of punctuation disrupts the overall semantic structure within the text that the model has correctly learned.

Conclusion
In conclusion, this paper demonstrates the effectiveness of involving next-token prediction in identifying possible LLMs that generate the text.We make contributions by collecting and releasing the OpenLLMText dataset, transferring the classification task into the next-token prediction task, conducting experiments with T5 model to create the T5-Sentinel, and providing insight on the differences of writing styles between LLMs through interpretability studies.In addition, we provide compelling evidence that our approach surpasses T5-Hidden and other existing detectors.As it eliminates the requirement for an explicit classifier, our approach stands out for its efficiency, simplicity, and practicality.

Limitations
The OpenLLMText dataset we collected is based on the OpenWebText dataset.
The original OpenWebText dataset collects human written English content from Reddit, an online discussion website mainly used in North America.Hence, the entries from human in dataset may bias towards native English speakers' wording and tone.This might lead to a degraded performance when the detector trained on OpenLLMText dataset is given human-written text from non-native English speakers.This tendency to misclassify non-native English writing as machine-generated is also mentioned by Liang et al. (2023).

A.1 Length Distribution
The length distribution of text sample from each source is presented in Figure 9. Since we truncated the text to first 512 tokens during training and evaluation, the actual length distribution for each source received by the classifier is shown in Figure 10, which is approximately the same across various sources.

A.2 Punctuation Distribution
Figure 11 shows the distribution of top-40 ASCII punctuation in OpenLLMText dataset.For most of the punctuation, all LLMs tend to generate them with similar frequency.However, PaLM does tend to generate "*" more frequently than other sources.However, further experiments on dataset cleaning (indicated in 6 in Appendix C) show that T5-Sentinel is not relying on this feature to identify PaLM generated text.

A.3 Token Distribution
The distribution of most commonly seen tokens from each source is presented in Figure 12.It is worth noting that while the GPT2 source lacks single quotation marks and double quotation marks, the overall token distributions from all sources exhibit a consistent pattern.

B Evaluation
The detailed evaluation results for human-to-LLM binary classification tasks and one-to-rest binary classification tasks are separately listed in Table 2 and 5.

Figure 7 Figure 8
Figure 7: t-SNE plot for T5-Sentinel on test subset of OpenLLMText dataset under perplexity of 100

Figure 9 :
Figure 9: Distribution of sample length measured by the number of tokens in the OpenLLMText dataset.

Figure 13
Figure 13 displays the word-class distribution like noun, adjective and others for each source in OpenLLMText dataset.The distribution is almost identical across all sources.

Table 2 :
Evaluation results for T5-Sentinel, T5-Hidden and baselines on each specific human-to-LLM binary classification task.For T5-Hidden model, we also tested with 5 random initializations and report the standard deviation of metrics under each task in italic.
Table 6 has shown the performance of T5-Sentinel under different dataset cleaning methods.
PaLM HTC has had a tough year .Revenue is down , it lost a patent suit to Apple , and it d re w criticism for pulling the plug on Je lly Bean updates for some of its phones .The company needs a win , and it ' s hoping that the D roid DNA will be it .The DNA is a high -end phone with a 5-inch 10 80 p display , the same quad -core Snapdragon chip as the Opti mus G and Nex us 4, and 2 GB of RAM .It ' s a powerful phone with a beautiful screen , but there are some trade off s .The first trade off is battery life .The DNA ' s battery is smaller than the batteries in the Opti mus G and Galaxy S III , and it doesn ' t last as long .(. ..Truncated) Sample 4. Label: LLaMA, Predicted: LLaMA . ..Barack Obama s aid he did not expect P resident -e lect Donald Trump to follow his administration ' s s blueprint s in dealing with Russia , yet hoped that Tru mp would " stand up " to Moscow .speakingduring a joint press appearance in White House after meeting Trump .The president also said that while Americans are concerned about Russian interference in to last year ' s election campaign and what it might mean for the f u ture of democracy , there was no evidence that votes were rig ged by outside actors at any point . said Obama during a joint press appearance in White House after meeting Trump .Out going US President Barack O b a mas aid that while Americans are concerned about Russian interference in to last year (... Truncated) Sample 5. Label: GPT2, Predicted: GPT2 Gene ric and other online electric curb side meter sizes There have been 11 50 comparison s between generic sets of in line -3 -gau ge , single -phase and Can ten n a -type electric curb side meters in Ontario over the past 20 years (19 75 -28 ), total ling 2 2.3 M km .Here are samples of current (10 -year average ) home electric curb side meters from selected suppliers .All currently available meters have a 1 " restriction and are marked with the size in deci mal format and a code of 1 or PF(. . .Truncated)