Analyzing Online Political Advertisements

Online political advertising is a central aspect of modern election campaigning for influencing public opinion. Computational analysis of political ads is of utmost importance in political science to understand the characteristics of digital campaigning. It is also important in computational linguistics to study features of political discourse and communication on a large scale. In this work, we present the first computational study on online political ads with the aim to (1) infer the political ideology of an ad sponsor; and (2) identify whether the sponsor is an official political party or a third-party organization. We develop two new large datasets for the two tasks consisting of ads from the U.S.. Evaluation results show that our approach that combines textual and visual information from pre-trained neural models outperforms a state-of-the-art method for generic commercial ad classification. Finally, we provide an in-depth analysis of the limitations of our best-performing models and linguistic analysis to study the characteristics of political ads discourse.

Political advertising is defined as 'any controlled message communicated through any channel designed to promote the political interests of individuals, parties, groups, government, or other organizations' (Kaid and Holtz-Bacha, 2006). It is guided by ideology and morals (Scammell and Langer, 2006;Kumar and Pathak, 2012), and often expresses more negativity (Haselmayer, 2019;Iyengar and Prior, 1999;Lau et al., 1999) compared to the aesthetic nature of commercial advertising. Table 1 shows examples of online political ads across different political parties and sponsor types.
While the closely related online commercial advertising domain has recently been explored in natural language processing (NLP) for predicting the category (e.g. politics, cars, electronics) and sentiment of an ad (Hussain et al., 2017;Kalra et al., 2020), online political advertising has yet to be explored. Large-scale studies of online political advertising have so far focused on understanding targeting strategies rather than developing predictive models for analyzing its content (Edelson et al., 2019;Medina Serrano et al., 2020).
Automatically analyzing political ads is important in political science for researching the characteristics of online campaigns (e.g. voter targeting, sponsors, non-party campaigns, privacy, and misinformation) on a large scale (Scammell and Langer, 2006;Johansson and Holtz-Bacha, 2019). Moreover, identifying ads sponsored by third-party organizations is critical to ensuring transparency and accountability in elections (Liu et al., 2013;Speicher et al., 2018;Fowler et al., 2020b;Edelson et al., 2019). For example, thirdparty advertising had an increased presence in the U.S. House and Senate races in 2018 considerably more than in 2012 where almost half of the third-  party sponsored ads were funded by dark-money sources (Fowler et al., 2020b). Finally, computational methods for political ads analysis can help linguists to study features of political discourse and communication (Kenzhekanova, 2015;Skorupa and Dubovičienė, 2015).
In this paper, we present a systematic study of online political ads (consisting of text and images) in the U.S. to uncover linguistic and visual cues across political ideologies and sponsor types using computational methods for the first time. Our contributions are as follows: Previous work on analyzing political advertising has covered television and online ads (Kaid and Postelnicu, 2005;Reschke and Anand, 2012;West, 2017;Fowler et al., 2020b). Ridout et al. (2010) analyze a series of YouTube videos posted during the 2008 presidential campaign to understand its influence on election results as well as the actors and formats compared to traditional television ads. Anstead et al. (2018) study how online platforms such as Facebook are being used for political communication and identify challenges for understanding the role of these platforms in political elections, highlighting the lack of transparency (Caplan and Boyd, 2016). Fowler et al. (2020b) explore differences between television and online ads, and demonstrate that there is a greater number of candidates advertising online than on television.

Political Ideology Prediction
Inferring the political ideology of various types of text including news articles, political speeches and social media has been vastly studied in NLP (Lin et al., 2008;Gerrish and Blei, 2011;Sim et al., 2013;Iyyer et al., 2014;Preoţiuc-Pietro et al., 2017;Kulkarni et al., 2018;Stefanov et al., 2020). Bhatia and P (2018) exploit topic-specific sentiment analysis for ideology detection (i.e. conservative, liberal) in speeches from the U.S. Congress. Kulkarni et al. (2018) propose a multi-view model that incorporates textual and network information to predict the ideology of news articles. Johnson and Goldwasser (2018) investigate the relationship between political ideology and language to represent morality by analyzing political slogans in tweets posted by politicians. Maronikolakis et al. (2020) present a study of political parody on Twitter focusing on the linguistic differences between tweets shared by real and parody accounts. Baly et al. (2019) estimate the trustworthiness and political ideology (left/right bias) of news sources as a multi-task problem. Stefanov et al. (2020) develop methods to predict the overall political leaning (left, center or right) of online media and popular Twitter users. Political ideology and communicative intents have also been studied in computer vision. Political images have been analyzed to infer the persuasive intents using various features such as facial display types, body poses, and scene context (Joo et al., 2014;Huang and Kovashka, 2016;Joo and Steinert-Threlkeld, 2018;Bai et al., 2020;. Joo et al. (2015) introduce a method that infers the perceived characteristics of politicians using face images and show that those characteristics can be used in elections forecasting. Xi et al. (2020) analyze the political ideology of Facebook photographs shared by members of the U.S. Congress.  examine the role of gender stereotypical cues from photographs posted in social media by political candidates and their relationship to voter support. Hussain et al. (2017) propose the task of ad understanding using vision and language. The aim is to predict the topical category, sentiment and rhetoric of an ad (i.e. what the message is about). The latter task has been approached as a visual question-answering task by ranking human generated statements that explain the intent of the ad in computer vision Ahuja et al., 2018). More recently in NLP, Kalra et al. (2020) propose a BERT-based (Devlin et al., 2019) model for this task using the text and visual descriptions of the ad (Johnson et al., 2016). Thomas and Kovashka (2018) study the persuasive cues of faces across ad categories (e.g. beauty, clothing). Zhang et al. (2018) explore the relationship between the text of an ad and the visual content to analyze the semantics across modalities.  integrates audio and visual modalities to predict the climax of an advertisement (i.e. stress levels) using sentiment annotations.

Tasks & Data
We aim to analyze the political ideology of ads consisting of image and text, and the type of the ad sponsor for the first time. To this end, we present two new binary classification tasks motivated by related studies in political communication (Grigsby, 2008;Fowler et al., 2020b): • Task 1: Conservative/Liberal The aim is to label an ad according to the political party that sponsored the ad either as Conservative (i.e. assuming that the dominant ideology of the Republican Party is conservatism), or Liberal (i.e. assuming that the dominant ideology of the Democratic Party is social liberalism) (Grigsby, 2008); • Task 2: Political Party/Third-Party The goal is to classify an ad according to the type of the organization that sponsored the ad. We distinguish between ads sponsored by official political parties and non-political organizations, such as businesses and non-profit groups, following Fowler et al. (2020b).
To the best of our knowledge, no datasets are available for modeling these two tasks. Therefore, we develop two new publicly available datasets consisting of political ads and ideology/sponsor type labels from the U.S.. We opted to use data only from the U.S. because its Federal Election Commission 4 (FEC) provides publicly available information of political ads sponsors such as official political parties (e.g. Democratic, Republican) via their FEC ID; and third-party organizations can be identified via their Employer Identification Number 5 (EIN) suitable for our study.

Collecting Online Political Ads
We use the public Google transparency report platform 6 to collect political ads. This platform contains information on verified political advertisers (i.e. sponsors) and provides links to actual political ads from Google Ad Services.
We collect all U.S. available data from the Google platform consisting of ads published from May 31, 2018 up to October 11, 2020 (note that  there is no data prior to 2018). This corresponds to a total of 168,146 image ads. Each ad is associated with a URL that links to its summary metadata consisting of a URL to the original image file and sponsor information, i.e. name and FEC ID, state elections registration or EIN ID. 7 We scrape all available image files resulting into a total of 158,599 ads which corresponds to 94.32% of all ads in the Google database. The rest of the ads were either not available due to violations to Google's Advertising Policy, the summary metadata was missing, or the file URL was not included in the metadata.

Extracting Text and Visual Information
Before, we label the ads with ideology and sponsor type, we extract two types of information from the images: (1) the text contained in each ad (Image Text; IT) using the Google Vision API; 8 and (2) the descriptive caption or denscap (D) of the image using the DenseCap API, 9 following the method proposed by Kalra et al. (2020) for commercial ad classification. This way, we obtain both the actual text appearing on the ad and the textual descriptions of the ad such as entities in the images, their characteristics and relationships. Table 2 shows an example of an ad consisting of an image, text information and the densecap.
We use the textual and visual information to eliminate all duplicate images by comparing the URL of the image, its text and densecap. Finally, we filter out all ads that contain non-English text (i.e. IT). 10 This results in 15,116 unique ads from 665 unique ad sponsors. 7 All ad sponsors must apply for eligibility verification in order to publish political ads on Google platformshttps: //support.google.com/displayvideo/answer /9014141 8 https://cloud.google.com/vision/docs /ocr 9 https://deepai.org/machine-learningmodel/densecap 10 https://pypi.org/project/langdetect/

Labeling Ads with Political Ideology
Our aim is to label political ads as Conservative or Liberal (see Task 1  In total, we collect 242 unique sponsors corresponding to 5,548 ads. Liberal ads represent the 39% of the total ads and the rest are Conservative (61%).

Labeling Ads with Sponsor Type
We first label all ads from sponsors that have an associated FEC ID in the Google database as Political Party. These sponsors correspond to official political committees affiliated with the Democratic or Republican parties (e.g. Biden for President).
Third-party sponsors of political ads consist of groups not officially associated to any political party such as not-for-profit organizations (e.g. NRDC Action Fund) and businesses (Fowler et al., 2020b). This type of sponsors are identified with their EIN ID (included in the Google database). Thus, we label all ads linked to an EIN ID as Third-Party. We collected a total of 15,116 ads where 37% corresponds to Political Party and 63% corresponds to Third-Party.

Data Splits
We split both datasets chronologically into train (80%), development (10%), and test (10%) sets. Table 3 shows the dataset statistics and splits for each task.

Data Preprocessing
Text We normalize the text from the image (IT) and the densecap (D) by lower-casing, and replacing all URLs and person names with a placeholder token. To identify the person names we use the Stanford NER Tagger (Finkel et al., 2005). Also, we replace tokens that appear in less than five ads with an 'unknown' token. We tokenize the text using the NLTK tokenizer (Bird et al., 2009).

Predictive Models
We experiment with textual, visual and multimodal models for political ad classification.

Linear Baselines
As baseline models, we use logistic regression with bag of n-grams and L2 regularization using (1) the image text (LR IT ); (2) densecap (LR D ); and (3) their concatenation (LR IT +D ) for representing each ad.

BERT
We also test three models proposed by Kalra et al. (2020) for generic ad classification demonstrating state-of-the-art performance. The models are based on Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2019) using a combination of the image text and the densecap. We follow a similar approach and fine-tune BERT for predicting the corresponding class in each task by adding an output dense layer for binary classification that receives the 'classification' [CLS] token as input. We use three types of inputs for each ad: (1) image text (BERT IT ); (2) densecap (BERT D ); and (3) their concatenation (BERT IT +D ).

EfficientNet
EfficientNet (Tan and Le, 2019) is a family of Convolutional Neural Network (CNN) (LeCun et al., 1995) models which has achieved state-of-the-art accuracy on ImageNet (Deng et al., 2009). In particular, we use EfficientNet-B3 and fine-tune it on political ad classification by adding an output dense layer for each binary classification task.

BERT+EffN
We finally test two multimodal models by combining: (1) BERT IT and EfficientNet (BERT IT +EffN); and (2) BERT IT +D and Efficient-Net (BERT IT +D +EffN). We concatenate the text representation obtained by BERT and the visual information from EfficientNet into a 768 + 1536 dimensional vector from BERT and EfficientNet respectively. This vector is then passed to an output layer for binary classification. We fine-tune the entire architecture for each task.

Experimental Setup
We select the hyperparameters for all neural models using early stopping by monitoring the validation binary cross-entropy loss, and we estimate the  class weights using the 'balanced' heuristic (King and Zeng, 2001) for each task, as both datasets are imbalanced. BERT and EfficientNet models use ADAM optimizer (Kingma and Ba, 2014), and experiments use 1 GPU (Nvidia V100).
EfficientNet We use EfficientNet-B3 with Noisy-Student weights (Xie et al., 2020). For ideology prediction, we first freeze the layers of the Effi-cientNet (Tan and Le, 2019) model and train it for 11 epochs with learning rate η = 1e −3 to learn the parameters of the output layer. We then unfreeze and train the whole network for another 30 epochs with η = 1e −4 , as it has been shown that unfreezing the CNN during the latter stages of training improves the performance of the network (Faghri et al., 2017). For predicting the type of sponsor, we train for 45 epochs and η = 1e −2 keeping the  EfficientNet layers frozen. Unfreezing the base model did not result into lower validation loss. We use dropout rate of 0.2 before passing the output of EfficientNet to the classification layer. The average training time is 37.8 minutes.
BERT+EffN For ideology prediction, we freeze all the layers of the pre-trained models (BERT and EfficientNet) apart from the classification layer and train for 27 epochs with η = 1e −3 . We then finetune BERT for 30 epochs with η = 1e −5 . For sponsor type prediction, we freeze all Efficient-Net layers and fine-tune BERT for 30 epochs with η = 2e −6 . We train in stages to ensure that the parameters of each part of the model (textual and visual) are properly updated (Kiela et al., 2019). The average training time is 56.65 minutes.

Results
This section presents the experimental results for the two predictive tasks, political ideology and sponsor type prediction ( §3) using the methods described in §4. We evaluate our models using macro precision, recall and F1 score since the data in both tasks is imbalanced. Note that for all models we report the average and standard deviation over three runs using different random seeds. We also report the majority class baseline for each task.

Predictive Performance
Task 1: Conservative/Liberal Table 5 shows the results for the political ideology prediction. We first observe that BERT IT (73.16%) which uses as input information the image text outperforms BERT D (57.64%) and EfficientNet (68.15%) in Moreover, combining image text and densecap (BERT IT +D ), leads to higher performance, than using only image text (BERT IT ), i.e. 75.49% and 73.16% F1 respectively. This indicates that the combination of textual with visual information (in the form of image descriptions) improves the model performance.
Finally, using all visual information sources, i.e. densecaps and image representation from Efficient-Net (BERT IT +D +EffN), further improves performance achieving the highest macro F1 (75.76%) across models, followed by BERT IT +D (75.49%).
Task 2: Political-Party/Third-Party Table 6 shows the results for the sponsor type prediction. The best overall performance is obtained by BERT IT +D +EffN (87.36%) which combines both image and textual information. BERT IT +D (86.90%) and LR IT +D (86.54%) follow very closely. By inspecting our data, we identified the presence of noise in image text, particularly sentences are interrupted by logos and other aesthetic elements. This negatively affects the performance of BERT because such models are usually pretrained on 'cleaner' generic corpora . On the other hand, LR models trained from scratch can adapt to the noisy text (see § 6.2 for error analysis).
Overall, our results in both tasks suggest that text is a stronger modality for inferring the political ideology and sponsor type of political ads compared to visual information extracted from the images. However, integrating visual information in the form of text descriptions (densecaps) or representations obtained by pre-trained image classification models, enhances model performance.

Error Analysis
We further perform an error analysis to examine the behavior of our best performing models (BERT IT +D +EffN and BERT IT +D ) and identify potential limitations.
The ad shown in Fig. 1 (a) was misclassified as Conservative by BERT IT +D and BERT IT +D +EffN. This particular ad requires common knowledge of social issues (e.g. inadequate health support) that are often discussed in political campaigns to inform voters about a party's views on the issue (Scammell and Langer, 2006). This makes the classification task difficult for the models since it requires contextual knowledge. Incorporating external relevant knowledge to the models (e.g. political speeches, interviews or public meetings) might improve performance (Lin et al., 2018).
The ad depicted in Fig. 1 (b) was misclassified by BERT IT +D and BERT IT +D +EffN as Conservative. After analyzing the densecap descriptions, we found that this information tends to be noisy. For this particular example, it contains descriptions such as 'a man is holding a horse', 'the sign is blue', 'a blue and white stripe shirt', and 'a man wearing a hat'. In fact, BERT IT , which only takes the image text into account, classified this ad correctly as Conservative. Improving the quality of the image descriptions (e.g. pre-training on advertising or political images, capturing specific attributes such as 'military hat') might be beneficial for these models. Fig. 1 (c) shows an example of a Political Party ad misclassified by BERT IT +D +EffN as Third-Party. The ad contains the following text: The message has a confrontational and divisive tone that is common in Third Party ads (Edelson  Table 7: Feature correlations with Conservative/Liberal Ads, sorted by Pearson correlation (r). All correlations are significant at p < .01, two-tailed t-test. et al., 2019), but is typically used as a political tactic for negative campaigning (Skaperdas and Grofman, 1995;Gandhi et al., 2016;Haselmayer, 2019).
Finally, Fig. 1 (d) shows an example of a Third-Party ad misclassified as Political Party by BERT IT +D +EffN. The text content promotes voter participation (e.g. Vote), a characteristic of Political Party advertising (see Table 8). However, one of the aims of the Third-Party advertising is precisely to encourage voting and activism (Dommett and Temple, 2018).
There is a considerable difference between the models using visual information only (LR D , BERT D , EfficientNet), and those that also use the ad text as input (IT, IT+D). Our intuition is that models get confused by the appearance of shapes, colors and other aesthetic features that are domain specific and appear frequently in political advertisements (Sartwell, 2011). For instance, several ads that belong to the Third-Party category, include buttons linking to websites (see Fig, 1 (c), (d)). However, Political Party ads, also make use of these type of buttons to link users to donation or informative websites (Edelson et al., 2019).

Linguistic Analysis
We perform an analysis based on our new data set to study the linguistic characteristics of political ads. We first analyze the specific features of each class for both tasks. For this purpose, we use a method introduced by Schwartz et al. (2013) to analyze uni-gram features from image text (see §4)  using univariate Pearson correlation. Features are normalized to sum up to unit for each ad. For each feature, we compute correlations independently between its distribution across ads and its label (Conservative/Liberal), or Political Party/Third Party). Table 7 presents the top unigrams correlated with Liberal and Conservative ads. We first notice that the top words in the Conservative category are closely related to its ideology such as 'conservative' and 'republican'. Other prominent terms in these categories are words related to current political issues, such as immigration (e.g. 'border') and taxation (e.g. 'taxes'). In fact, these are examples of emotionally evocative terms (e.g. anger about taxes) that are frequently used in political campaigns to influence voters (Brader, 2005). Top terms of Liberal ads include 'necessary', 'end','values', and 'win'. For example, the following ads belong to the Liberal class:

Conservative vs. Liberal
I'm supporting <person> because he has the same values that I do and he's an honest person.

<person> FOR CONGRESS
To End Gun Violence These are examples of ads containing a combination of moral and controversial topics (e.g. gun regulation) which are typical characteristics of political advertising (Kumar and Pathak, 2012). Table 8 shows the top unigram features correlated with the sponsor type of an ad (Political Party/Third-Party). We observe that some top terms in the Political Party class also belong to the top terms of the political ideology task (see Table  7) such as 'committee', 'republican' and 'senate'. Messages calling for vote and donation support ('vote', 'donate', '$') are also prevalent in Political Party ads (Fulgoni et al., 2016), as in the next example (See Fig. 1 (b)):

Political Party vs. Third-Party
Making sure our veterans get the care they've earned

VOTE FOR <person>
On the other hand, top features from the Third-Party category (e.g. 'action', 'protect') share common characteristics with the rhetoric used by media outlets focused on promoting specific political messaging (Edelson et al., 2019;Dommett and Temple, 2018). Many of these ads direct people to websites to read about a particular topic. For example: Is <person> HIDING ANTI-GUN VIEWS? Learn More This ad belongs to the Third-Party class and points the viewer to an external website for reading further details.

Conclusion
We have presented the first study in NLP for analyzing the language of political ads motivated by prior studies in political communication. We have introduced two new publicly available datasets containing political ads from the U.S. in English labeled by (1) the ideology of the sponsor (Conservative/Liberal); and (2) the sponsor type (Political Party/Third Party). We have defined both tasks as advertisement-level binary classification and evaluated a variety of approaches, including textual, visual and multimodal models reaching up to 75.76 and 87.36 macro F1 in each task respectively.
In the future, we aim to incorporate other modalities such as speech, and video, and explore other methods of acquiring and integrating multimodal information. In addition, we aim to extend our work for analyzing political advertising discourse across different regions, languages and platforms.
is supported by the Centre for Doctoral Training in Speech and Language Technologies (SLT) and their Applications funded by the UK Research and Innovation grant EP/S023062/1. NA is supported by a Leverhulme Trust Research Project Grant.

Ethics Statement
Our work complies with the Terms of Service of the Google Political Ads Dataset. 11 We provide, for reproducibility purposes, the list of ad IDs and corresponding labels used for each task, as well as the data splits (train, development, test). All data used in this paper is in English. The ads information can be retrieved from Google according to their policy.