Multi-perspective Coherent Reasoning for Helpfulness Prediction of Multimodal Reviews

As more and more product reviews are posted in both text and images, Multimodal Review Analysis (MRA) becomes an attractive research topic. Among the existing review analysis tasks, helpfulness prediction on review text has become predominant due to its importance for e-commerce platforms and online shops, i.e. helping customers quickly acquire useful product information. This paper proposes a new task Multimodal Review Helpfulness Prediction (MRHP) aiming to analyze the review helpfulness from text and visual modalities. Meanwhile, a novel Multi-perspective Coherent Reasoning method (MCR) is proposed to solve the MRHP task, which conducts joint reasoning over texts and images from both the product and the review, and aggregates the signals to predict the review helpfulness. Concretely, we first propose a product-review coherent reasoning module to measure the intra- and inter-modal coherence between the target product and the review. In addition, we also devise an intra-review coherent reasoning module to identify the coherence between the text content and images of the review, which is a piece of strong evidence for review helpfulness prediction. To evaluate the effectiveness of MCR, we present two newly collected multimodal review datasets as benchmark evaluation resources for the MRHP task. Experimental results show that our MCR method can lead to a performance increase of up to 8.5% as compared to the best performing text-only model. The source code and datasets can be obtained from https://github.com/jhliu17/MCR.

make purchase decisions. Many e-commerce sites such as Amazon.com offer reviewing functions that encourage consumers to share their opinions and experiences. However, the user-generated reviews vary a lot in their qualities, and we are continuously bombarded with ever-growing, noise information. Therefore, it is critical to examine the quality of reviews and present consumers with useful reviews.
Motivated by the demand of gleaning insights from such valuable data, review helpfulness prediction has gained increasing interest from both academia and industry communities. Earlier review helpfulness prediction methods rely on a wide range of handcrafted features, such as semantic features (Yang et al., 2015), lexical features (Martin and Pu, 2014), and argument based features (Liu et al., 2017), to train a classifier. The success of these methods generally relies heavily on feature engineering which is labor-intensive and highlights the weakness of conventional machine learning methods. In recent years, deep neural networks such as CNN (Chen et al., 2018(Chen et al., , 2019 and LSTM (Fan et al., 2019) have become dominant in the literature due to their powerful performance for helpfulness prediction by learning text representation automatically. Note that these existing works on review helpfulness prediction mainly focus on the pure textual data.
As multimodal data become increasingly popular in online reviews, Multimodal Review Analysis (MRA) has become a valuable research direction. In this paper, we propose the Multimodal Review Helpfulness Prediction (MRHP) task which aims at exploring multimodal clues that often convey comprehensive information for review helpfulness prediction. In particular, for the multimodal reviews, the helpfulness of reviews is not only determined by the textual content but rather the combined expression (e.g., coherence) of multimodality data (e.g., texts and images). Taking the reviews in Table 1 as an example, we cannot identify the helpfulness score of Review 3 solely from the text content until reading the attached images that are totally irrelevant to the product "Teflon Pans". The reviews that have incoherent text content and images tend to be unhelpful, even be malicious reviews. In contrast, a helpful review (e.g., Review 2) should contain not only concise and informative textual content but also coherent text content and images.
In this paper, we explore both text and images in product reviews to improve the performance of review helpfulness prediction. We design a novel Multi-perspective Coherent Reasoning method (denoted as MCR) to tackle the MRHP task. Concretely, we propose a product-review coherent reasoning module to effectively capture the intra-and inter-modal coherence between the target product and the review. In addition, we also devise an intrareview coherent reasoning module to capture the coherence between the text content and images of the review, which is a piece of strong evidence for review helpfulness prediction. Finally, we formulate the helpfulness prediction as a ranking problem and employ a pairwise ranking objective to optimize the whole model.
We summarize our main contributions as follows.
(1) To the best of our knowledge, this is the first attempt to explore both text and images in reviews for helpfulness prediction, which is defined as the MRHP task. (2) We propose a multi-perspective coherent reasoning method for the MRHP task to conduct joint reasoning over texts and images from both the product and the review, and aggregate the signals to predict the helpfulness of multimodal reviews. (3) We present two newly-collected multimodal review datasets for helpfulness prediction of multimodal reviews. To facilitate research in this area, we will release the datasets and source code proposed in this paper, which would push forward the research in this field. (4) Extensive experiments on two collected datasets demonstrate that our MCR method significantly outperforms other methods.

Related Work
Most conventional approaches on review helpfulness prediction focus solely on the text of reviews, which can be generally divided into two categories based on the way of extracting predictive features: machine learning based methods with hand-crafted features (Kim et al., 2006;Krishnamoorthy, 2015) Product Information Teflon Pans 1 Set of 3 pcs 1042-Non-stick Set of 3 Review 1 (Helpfulness Score: 2) Overall, it is quite satisfactory. Thanks to the seller.
Review 2 (Helpfulness Score: 4) For that price, it is more than satisfactory, even though there are a few scratches in the pan and the small frying pan, the package is very neat, the frying pan has been used as if it's a little burnt, it looks like it can't stand the heat, but overall I like it.
Review 3 (Helpfulness Score: 0) Recommend for the price. Yes, the package is neat but the pan has scratched. It is unfortunate for the delivery. I ordered 4 items in this shop. but the postage has to pay double and quite very expensive. Table 1: Example of multimodal reviews under the same product "Teflon Pan". Review 1: The brief review text is insufficient to predict its helpfulness to the corresponding product, while the images provide a rich semantic supplement. Review 2: A helpful review with a good coherence between text and images. Review 3: An irrelevant image is attached to the review. and deep learning based methods (Chen et al., 2019;Fan et al., 2018;Chen et al., 2018). The machine learning based methods employ domain-specific knowledge to extract a variety of hand-crafted features, such as structure features (Kim et al., 2006), lexical features (Krishnamoorthy, 2015), emotional features (Martin and Pu, 2014), and argument features (Liu et al., 2017), from the textural reviews, which are then fed into conventional classifiers such as SVM (Kim et al., 2006) for helpfulness prediction. These methods rely heavily on feature engineering, which is time-consuming and labor intensive. Motivated by the remarkable progress of deep neural networks, several recent studies attempt to automatically learn deep features from textual reviews with deep neural networks. Chen et al. (2019) employs a CNN model to capture the multi-granularity (character-level, word-level, and topic-level) features for helpfulness prediction. Fan et al. (2018) proposes a multi-task neural learning model to identify helpful reviews, in which the primary task is helpfulness prediction and the auxiliary task is star rating prediction.
Subsequently, several works have been proposed to explore not only the reviews but also the users and target products for helpfulness prediction of reviews. Fan et al. (2019) argued that the helpfulness of a review should be aware of the meta-data (e.g., title, brand, category, description) of the target product besides the textual content of the review itself. To this end, a deep neural architecture was proposed to capture the intrinsic relationship between the meta-data of a product and its numerous reviews. Qu et al. (2020) proposed to leverage the reviews, the users, and items together for helpfulness prediction of reviews and devised a categoryaware graph neural networks with one shared and many item-specific graph convolutions to learn the common features and each item's specific criterion for helpfulness prediction. Different from the above methods, we take full advantage of the text content and images of reviews by proposing a novel hierarchical coherent reasoning method to learn the coherence between text content and images in a review and the coherence between the target product and the review.

Methodology
The overall architecture of our MCR method is illustrated in Figure 1. Our multi-perspective coherent reasoning consists of two perspectives of coherence: (i) the intra-and inter-modal coherence between a review and the target product and (ii) the intra-review coherence between the text content and images in the review. In the following sections, we will provide the problem definition of review helpfulness prediction and introduce each component of our MCR model in detail.

Problem Definition
As mentioned by Diaz and Ng (2018), we formulate the multimodal review helpfulness prediction problem as a ranking task. Specifically, given a product item P i consisting of product related information p i and an associated review set R i = {r i,1 , · · · , r i,N }, where N is the number of reviews for p i . Each review has a scalar label s i,j ∈ {0, · · · , S} indicating the helpfulness score of the review r i,j . The ground-truth ranking of R i is the descending sort order determined by the helpfulness scores. The goal of review helpfulness prediction is to predict helpfulness scores for R i which can rank the set of reviews R i into the ground-truth result. The predicted helpfulness scoreŝ i,j for the review r i,j is defined as follows: where f is the helpfulness prediction function taking a product-review pair p i , r i,j as input. In multimodal review helpfulness prediction task, the product p i consists of associated description T p and pictures I p , while review r i,j consists of userposted text T r and images I r .

Feature Representation
Given a text (T p or T r ) consisting of l T text tokens {w 1 , · · · , w l T } and an image set (I p or I r ), we adopt a convolutional neural network to learn the contextualized text representation. Meanwhile, we use a self-attention mechanism on image region features to obtain the image representations. To prevent conceptual confusion, we use the subscripts p and r to indicate variables that are related to the product and the review, respectively.
Text Representation Inspired by the great success of convolutional neural network (CNN) in natural language processing (Kim, 2014;Dai et al., 2018), we also apply CNN to learn the text representation. First, we convert each token w i in a review into an embedding vector w i ∈ R d via an embedding layer. Then, we pass the learned word embeddings to a one-dimensional CNN so as to extract multi-gram representations. Specifically, the k-gram CNN transforms the token embedding vectors w i into k-gram representations H k : where k ∈ {1, · · · , k max } represents the kernel size. k max represents the maximum kernel size. H k ∈ R l T ×d T is the k-gram representation. All the k-gram representations are stacked to form the final text representation, denoted as Here, we use H p and H r to represent the representations of text content of the product and the review, respectively.

Image Representation
We use pre-trained Faster R-CNN to extract the region of interest (RoI) pooling features (Anderson et al., 2018) for the

Product-review Coherent Reasoning
Product-review and Intra-review Coherent Features Helpful Score

Coherent Graph
Layer 0 Layer N …

Multimodal Product Representation Multimodal Review Representation
Intra-review Coherent Reasoning Figure 1: Model overview of our MCR method, which consists of two primary coherent reasoning components: product-review coherent reasoning and intra-review coherent reasoning. review and product images, obtaining the finegrained object-aware representations. All the RoI features v i extracted from image sets I p and I r are then encoded by a self-attention module (Vaswani et al., 2017), resulting in a d I -dimensional semantic space with non-local understanding: where V ∈ R l I ×d I represents the visual semantic representation and l I is the number of extracted RoI features. Here, we use V p and V r to represent the product and review image features, respectively.

Product-Review Coherent Reasoning
The helpfulness of a review should be fully aware of the product besides the review itself. In this paper, we propose a product-review coherent reasoning module to effectively capture the intra-and inter-modal coherence between the target product and the review.

Intra-modal Coherence
We propose the intramodal coherent reasoning to measure two kinds of intra-modal coherence: (i) the semantic alignments between the product text and the review text, and (ii) the semantic alignments between product images and review images. The cosine similarity is utilized to derive the intra-modal coherence matrix. For text representations H i p and H j r , we compute the corresponding coherence matrix as follow: where S H i,j has the shape of R l Tp ×l Tr , l Tp and l Tr indicate the text length of the product and the review, respectively. All the coherence matrices are stacked to form the whole coherence features S H . Without loss of generality, we also compute the image coherence matrix between V p and V r via cosine similarity. In this way, we obtain the image coherence matrix S V with the shape of R l Ip ×l Ir , where l Ip and l Ir indicate the number of RoI features of the product and review images, respectively.
Subsequently, the text and image coherence matrix (i.e., S H and S V ) are passed to a CNN, and the top-K values in each feature map are selected as the pooling features: where o intraM ∈ R K * M is the intra-modal coherent reasoning features. M is the number of filters used in the CNN module.
Inter-modal Coherence The intra-modal coherence ignores the cross-modal relationship between the product and the review. In order to mitigate this problem, we propose the inter-modal coherent reasoning to capture two kinds of inter-modal coherence: (i) the coherence between the review text and the product images, and (ii) the coherence between the review images and the product text.
Since the text representation H and the image representation V lie in two different semantic spaces, we first project them into a d c -dimensional common latent space by: where F H ∈ R l T ×dc and F V ∈ R l I ×dc are text and image representations in the common latent space, respectively. Taking the coherence of review image and product text as an example, our inter-modal coherent reasoning aligns the features in review images F V r based on the product text F H p . Specifically, we define the review images as the query Q r = W Q F V r and the product text as the key K p = W K F H p , where W Q , W K ∈ R dc×dc are learnable parameter matrices. Hence, the inter-modal relationship I V r can be formulated as follows: where M r ∈ R l I ×l T is the query attended mask. A mean-pooling operation is then conducted to get an aggregated vector of the inter-modal coherence features between the review images and the product text: Following Equations 8-10, the same procedure is employed to learn the coherence featuresĨ H r between the review text and the product images. Finally, we concatenateĨ V r andĨ H r to form the final inter-modal coherence features o interM : where [·] denotes the concatenate operation.

Intra-review Coherent Reasoning
Generally, consumers usually express their opinions in textual reviews and post images as a kind of evidence to support their opinions. To capture the coherence between the text content and images of the review, we should grasp sufficient relational and logical information between them. To this end, we devise an intra-review coherent reasoning module to learn the coherence between the text content and images of the review, which performs message propagation among semantic nodes of a review evidence graph and then obtains an intra-review coherence score of the multimodal review. Specifically, we construct a review evidence graph G r by taking each feature (each row) of F H r and F V r as a semantic node, and connects all node pairs with edges, resulting in a fullyconnected review evidence graph with l T + l I nodes. In a similar manner, we can construct a product evidence graph G p with l T + l I nodes from F H p and F V p . The hidden states of nodes at layer t are denoted as G t r = {g t r,1 , . . . , g t r,n } and G t p = {g t p,1 , . . . , g t p,n } for the review and product evidence graphs respectively, where n = l T + l I and t denotes the number of hops for graph reasoning. We compute the edge weights of semantic node pairs with an adjacency matrix that can be automatically learned through training. Taking the review evidence graph G r as an example, we initialize the i-th semantic node at the first layer with , · · · , l T + l I }. Then, the adjacency matrix A t representing edge weights at layer t is computed as follows: where MLP t−1 is an MLP at layer t − 1.Ã t i,j represents semantic coefficients between a node i with its neighbor j ∈ N i . Softmax operation is used to normalize semantic coefficientsÃ t . Then, we can obtain the reasoning features at layer t by: By stacking L graph reasoning layers, the semantic nodes can perform coherence relation reasoning by passing messages with each other. We use g L r,n and g L p,n to denote the final reasoning hidden states of the review and product evidence graphs. Subsequently, to obtain the product-related intra-review coherent reasoning features, we adopt an attention mechanism to filter the features that are irrelevant to the product: where a mean pooling operation is employed to derive the product coherent graph embedding p. MLP is an attention layer to calculate the productrelated features and output the attention weightα i for the i-th node. After normalizing the attention weight with a softmax function, we use a linear combination to aggregate the intra-review coherent reasoning results o IRC : 5932

Review Helpfulness Prediction
We concatenate the intra-modal product-review coherence features o intraM , the inter-modal productreview coherence features o interM , and the intrareview coherence features o IRC to form the final multi-perspective coherence features The final helpfulness prediction layer feeds o final into a linear layer to calculate a ranking score: where W r and b r denote the projection parameter and bias term. p i represents information of the i-th product and r i,j is the j-th review for p i . The standard pairwise ranking loss is adopted to train our model: where r + , r − ∈ R i are an arbitrary pair of reviews for p i where r + has a higher helpfulness score than r − . β is a scaling factor that magnifies the difference between the score and the margin. Since our MCR model is fully differentiable, it can be trained by gradient descent in an end-to-end manner.

Datasets
To the best of our knowledge, there is no benchmark dataset for the Multimodal Review Helpfulness Prediction task (MRHP). Hence, we construct two benchmark datasets (Lazada-MRHP and Amazon-MRHP) from popular e-commerce platforms to evaluate our method.

Lazada-MRHP in Indonesian
Lazada.com is a popular platform in Southeast Asia, which is in the Indonesian language. We construct the Lazada-MRHP dataset by crawling the product information (title, description, and images) and user-generated reviews (text content and images) from Lazada. To make sure that the user feedback of helpfulness voting is reliable, we strictly extract the reviews which were published spanning from 2018 to 2019. We focus on three product categories, including Clothing, Shoes & Jewelry (CS&J), Electronics (Elec.), and Home & Kitchen (H&K).

Amazon-MRHP in English
The Amazon review dataset (Ni et al., 2019) was collected from Amazon.com, containing meta-data of products  13,205/324,907 3,327/79,570 H&K 18,186/462,225 4,529/111,193 Table 2: Statistics of the two datasets. #P and #R represent the number of products and reviews, respectively. and customer reviews from 1996 to 2018. We extract the product information and associated reviews published from 2016 to 2018. Since there are no review images in the original Amazon dataset, we crawl the images for each product and review from the Amazon.com platform. Similar to Lazada-MRHP, the products and reviews also belong to three categories: Clothing, Shoes & Jewelry (CS&J), Electronics (Elec.), and Home & Kitchen (H&K).
Learning from user-feedback in review helpfulness prediction has been revealed effective in (Fan et al., 2019;Chen et al., 2019). Specifically, the helpfulness voting received by each review can be treated as the pseudo label indicating the helpfulness level of the review. Following the same data processing as in (Fan et al., 2019), we filter the reviews that received 0 votes in that they are under an unknown user feedback state. Based on the votes received by a review, we leverage a logarithmic interval to categorize reviews into five helpfulness levels. Specifically, we map the number of votes into five intervals (i.e., [1,2), [2, 4), [4, 8), [8, 16), [16, ∞)) based on an exponential with base 2. The five intervals correspond to five helpfulness scores s i,j ∈ {0, 1, 2, 3, 4}, where the higher the score, the more helpful the review. Finally, the statistics of the two datasets are shown by Table 2. For both Lazada-MRHP and Amazon-MRHP, we utilize 20% of the training set per category as the validation data.

Implementation Details
For a fair comparison, we adopt the same data processing for all baselines. We use the ICU tokenizer 1 and NLTK toolkit (Loper and Bird, 2002) to separate text data in Lazada-MRHP and Amazon-MRHP, respectively. Each image is extracted as RoI features with 2048 dimensions. For the net-   work configurations, we initialize the word embedding layers with the pre-trained 300D GloVE word embeddings 2 for Amazon-MRHP and the fastText multilingual word vectors 3 for Lazada-MRHP. The text n-gram kernels are set as 1, 3, and 5 with 128 hidden dimensions. For the image representations, we set the encoded size of feature d l I as 128, and the size of common latent space d c is set to 128. We stack two graph reasoning layers (i.e., L = 2) where the hidden dimension of each layer is set to 128. We adopt the Adam optimizer (Kingma and Ba, 2014) to train our model, and the batch size is set to 32. The margin hyperparameter β is set to 1.

Compared Methods
We compare MCR with several state-of-the-art review helpfulness methods. First, we compare MCR with four strong methods that rely only on the text content of reviews, including the Bilateral Multi-Perspective Matching (BiMPM) model , Embedding-gated CNN (EG-CNN) (Chen et al., 2018), Convolutional Kernelbased Neural Ranking Model (Conv-KNRM) (Dai et al., 2018), the Product-aware Helpfulness Prediction Network (PRHNet) (Fan et al., 2019). We are the first to leverage images in the re-2 http://nlp.stanford.edu/data/glove.6B.zip 3 https://fasttext.cc/docs/en/crawl-vectors.html view for helpfulness prediction of multimodal reviews, thereby we compare our MCR model with two strong multimodal reasoning techniques: SSE-Cross (Abavisani et al., 2020) that leverages stochastic shared embedding to fuse different modality representations and D&R Net (Xu et al., 2020) that adopts a decomposition and relation network to model both cross-modality contrast and semantic association.

Evaluation Metrics
In this paper, we propose a pairwise ranking loss function for review helpfulness prediction, which fully benefits from the sampling of informative negative examples. Since the output of MCR is a list of reviews ranked by their helpfulness scores, we adopt two authoritative ranking-based metrics to evaluate the model performance: Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG@N) (Järvelin and Kekäläinen, 2017). Here, the value of N is set to 3 and 5 in the experiments for NDCG@N. MAP is a widely-used measure method evaluating the general ranking performance on the whole candidate review set, while NDCG@N merely takes into account the top N reviews in the scenario that the customers only read a limited number of reviews. 5934 5 Experimental Results

Main Results
Since we adopt the pairwise ranking loss for review helpfulness prediction, we treat the product text as the query, and the associated reviews are viewed as candidates for ranking. Table 3 and Table 4 report the results of MCR and baselines on Lazada-MRHP and Amazon-MRHP, respectively. From the results, we can make the following observations. First, EG-CNN performs worse than other text-only baselines, because EG-CNN only considers the hidden features from the review text, while other text-only methods additionally utilize the product information as a helpfulness signal. Second, the multimodal baselines (SSE-Cross and D&R Net) perform significantly better than textonly baselines. This verifies that multimodal information of reviews can help the models to discover helpful reviews. Third, MCR performs even better than strong multimodal competitors. For example, on Lazada-MRHP, MAP and NDCG@3 increase by 2.9% and 3.5% respectively over the best baseline method (i.e., D&R Net). We can observe similar trends on Amzaon-MRHP. The advantage of MCR comes from its capability of capturing the product-review and intra-review coherence.

Ablation Study
To analyze the effectiveness of different components of MCR, we conduct detailed ablation studies in terms of removing intra-review coherence (denoted as w/o intra-review), removing intra-modal coherence between product and review images (denoted as w/o intra-modal-I), removing intra-modal coherence between product and review texts (denoted as w/o intra-modal-II), removing inter-modal coherence between review text and product images (denoted as w/o inter-modal-I), and removing inter-modal coherence between review images and product text (denoted as w/o inter-modal-II). The ablation test results on the CS&J category of Lazada and Amazon datasets are summarized in Table 5. We can observe that the intra-review coherent reasoning has the largest impact on the performance of MCR. This suggests that the images within a review are informative evidence for review helpfulness prediction. The improvements of the intra-modal and inter-modal coherent reasoning in the product-review coherent reasoning module are also significant. However, intra-modal-I and intramodal-II have a smaller impact on MCR than the  other two variants. This may be because most product images have been always beautified, and there are significant differences between the product images and the images posted by the consumers. It is no surprise that combining all components achieves the best performance on both datasets.

Case Study
To gain more insight into the multimodal review helpfulness prediction task, we use an exemplary case that is selected from the test set of Home & Kitchen category of Amazon-MRHP to empirically investigate the effectiveness of our model. Table 6 shows a product and two associated reviews with ground-truth helpfulness scores voted by consumers. These two reviews are ranked correctly by our MCR method while being wrongly ranked by strong baselines (e.g., Conv-KNRM and PRHNet). The text content of both reviews contains negative emotion words (e.g., "disappointed" and "sad") and expresses similar information "the product size does not meet my expectation". It is hard for text-only methods to discriminate the helpfulness of these two reviews via solely considering the text content of reviews. After analyzing the images within the reviews, we can reveal that the Review 1 is helpful since it provides two appropriate bed images with a brought comforter as evidence that can well support his/her claim in the text content. However, Review 2 provides an inappropriate image with the product package, which cannot well support the claim of product size. This verifies that it is essential to capture the complex semantic relationship between the images and text content within a review for helpfulness prediction.

Product Information
Bedding printed comforter set (king, grey) with 2 pillow shams -luxurious soft brushed microfiber -goose down alternative comforter Review 1 (Helpfulness Score: 4) Though I like the color and look, I am very disappointed in the size. The picture on amazon shows the comforter going all the way to the floor. To be sure, I ordered the king size. As you can see in the photos, I have a queen bed and the comforter still has 18" to the floor on each side. I will try to fix it with a bed skirt.
Review 2 (Helpfulness Score: 1) This comforter is very fluffy and does have a nice feel to it, but is far too small to actually cover much more than the top of the bed. In the picture, it nearly touched the floor on both visible sides. Likewise, it was described as a printed comforter set (grey, queen) with 2 pillow shams -luxurious soft brushed microfiber -goose down alternative comforter by utopia bedding but the item itself said nothing of being a down alternative. I'm sad that this doesn't meet my expectations. Table 6: An example product and two associated reviews. We use underlines to highlight main opinions.

Conclusion
Multimodal review analysis (MRA) is extremely important for helping businesses and consumers quickly acquire valuable information from usergenerated reviews. This paper is the first attempt to explore the multimodal review helpfulness prediction (MRHP) task, which aims at analyzing the review helpfulness from text and images. We propose a multi-perspective coherent reasoning (MCR) method to solve MRHP task, which fully explores the product-review coherence and intra-review coherence from both textual and visual modalities. In addition, we construct two multimodal review datasets to evaluate the effectiveness of MCR, which may push forward the research in this field. Extensive experimental results demonstrate that MCR significantly outperforms baselines by comprehensively exploiting the images associated with the reviews.