Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction

Multimodal Review Helpfulness Prediction (MRHP) aims to rank product reviews based on predicted helpfulness scores and has been widely applied in e-commerce via presenting customers with useful reviews. Previous studies commonly employ fully-connected neural networks (FCNNs) as the final score predictor and pairwise loss as the training objective. However, FCNNs have been shown to perform inefficient splitting for review features, making the model difficult to clearly differentiate helpful from unhelpful reviews. Furthermore, pairwise objective, which works on review pairs, may not completely capture the MRHP goal to produce the ranking for the entire review list, and possibly induces low generalization during testing. To address these issues, we propose a listwise attention network that clearly captures the MRHP ranking context and a listwise optimization objective that enhances model generalization. We further propose gradient-boosted decision tree as the score predictor to efficaciously partition product reviews' representations. Extensive experiments demonstrate that our method achieves state-of-the-art results and polished generalization performance on two large-scale MRHP benchmark datasets.

1 Introduction E-commerce platforms, such as Amazon and Lazada, have achieved steady development.These platforms generally provide purchasers' reviews to supply justification information for new consumers and help them make decisions.Nevertheless, the quality and usefulness of reviews can vary hugely: some are helpful with coherent and informative content while others unhelpful with trivial or irrelevant information.Due to this, the Multimodal Review Helpfulness Prediction (MRHP) task is proposed.It ranks the reviews by predicting their helpfulness scores based on the textual and visual * Corresponding Author modality of products and reviews, because helpful reviews should comprise not only precise and informative textual material, but also consistent images with text content (Liu et al., 2021;Nguyen et al., 2022).This can help consumers find helpful reviews instead of unhelpful ones, resulting in more appealing E-commerce platforms.
In MRHP, multimodal reviews naturally form ranking partitions based on user votings, where each partition exhibits distinct helpfulness feature level (Ma et al., 2021).As such, the MRHP score regressor's function is to assign scores to indicate the partition for hidden features of product reviews.However, current MRHP approaches employ fullyconnected neural networks (FCNNs), which cannot fulfill the partition objective.In particular, FCNNs are ineffective in feature scaling and transformation, thus being inadept at feature space splitting and failing to work efficiently in ranking problems that involve ranking partitions (Beutel et al., 2018;Qin et al., 2021).An illustration would be in Figure 1, where the helpfulness scores predicted by FC-NNs do not lucidly separate helpful and unhelpful reviews.Severely, some unhelpful reviews possess logits that can even stay in the range of helpful ones, bringing about fallacious ranking.
In addition to incompetent model architectures, existing MRHP frameworks also employ suboptimal loss function: they are mostly trained on a pairwise loss to learn review preferences, which unfortunately mismatches the listwise nature of review ordering prediction.Firstly, the mistmatch might empirically give rise to inefficient ranking performance (Pasumarthi et al., 2019;Pobrotyn and Białobrzeski, 2021).Second, pairwise traning loss considers all pairs of review as equivalent.In consequence, the loss cannot differentiate a pair of useful and not useful reviews from a pair of moderately useful and not useful ones, which results in a model that distinguishes poorly between useful and moderately useful reviews.To address these issues, we first propose a Gradient-Boosted Decision Tree (GBDT) as the helpfulness score regressor to utilize both its huge capacity of partitioning feature space (Leboeuf et al., 2020) and differentiability compared with standard decision trees for end-to-end training.We achieve the partition capability with the split (internal) nodes of the tree implemented with non-linear single perceptron, to route review features to the specific subspace in a soft manner.
Furthermore, we develop a theoretical analysis to demonstrate that pairwise training indeed has lower model generalization than listwise approach.We proceed to propose a novel listwise training objective for the proposed MRHP architecture.We also equip our architecture with a listwise attention network that models the interaction among the reviews to capture the listwise context for the MRHP ranking task.
In sum, our contributions are four-fold: • We propose a novel gradient-boosted decision tree score predictor for multimodal review helpfulness prediction (MRHP) to partition product review features and properly infer helpfulness score distribution.
• We propose a novel listwise attention module for the MRHP architecture that conforms to the listwise context of the MRHP task by relating reviews in the list.
• We perform theoretical study with the motivation of ameliorating the model generalization error, and accordingly propose a novel MRHP training objective which satisfies our aim.
• We conducted comprehensive experiments on two benchmark datasets and found that our approach significantly outperforms both textonly and multimodal baselines, and accomplishes state-of-the-art results for MRHP.

Background
In this section, we recall the Multimodal Review Helpfulness Prediction (MRHP) problem.Then, we introduce theoretical preliminaries which form the basis of our formal analysis of the ranking losses for the MRHP problem in the next section.

Problem Definition
Following (Liu et al., 2021;Han et al., 2022;Nguyen et al., 2022), we formulate MRHP as a ranking task.In detail, we consider an instance X i to consist of a product item p i , composed of product description T p i and images I p i , and its respective review list Each review r i,j carries user-generated text T r i,j , images I r i,j , and an integer scalar label y i,j ∈ {0, 1, . . ., S} denoting the helpfulness score of review r i,j .The ground-truth result associated with X i is the descending order determined by the helpfulness score list Y i = {y i,1 , y i,2 , . . ., y i,|R i | }.The MRHP task is to generate helpfulness scores which match the groundtruth ranking order, formulated as follows: where f represents the helpfulness prediction model taking ⟨p i , r i,j ⟩ as the input.

Analysis of Generalization Error
The analysis involves the problem of learning a deep θ-parameterized model f θ : X → Y that maps the input space X to output space Y and a stochastic learning algorithm A to solve the optimization problem as follows: where P denotes the distribution of (x, y), l the loss function on the basis of the difference between ŷ = f θ (x) and y, and R true (f θ ) = E (x,y)∼P l(f θ ; (x, y)) is dubbed as the true risk.
Since P is unknown, R true is alternatively solved through optimizing a surrogate empirical risk i=1 denotes a training dataset drawn from P that f θ D is trained upon.Because the aim of deep neural model training is to produce a model f θ that provides a small gap between the performance over D, i.e.R emp (f θ D ), and over any unseen test set from P, i.e.R true (f θ D ), the analysis defines the main focus to be the generalization error , the objective to be achieving a tight bound of E(f θ D ), and subsequently the foundation regarding the loss function's Lipschitzness as: where | • | denotes the l 1 -norm, K the dimension of the output ŷ.
Given the foundation, we have the connection between the properties of loss functions and the generalization error: Theorem 1.Consider a loss function that 0 ≤ l(ŷ, y) ≤ L that is convex and γ-Lipschitz with respect to ŷ. Suppose the stochastic learning algorithm A is executed for T iterations, with an annealing rate λ t to solve problem (2).Then, the following generalization error bound holds with probability at least 1 − δ (Akbari et al., 2021): Theorem (1) implies that by establishing a loss function L with smaller values of γ and L, we can burnish the model generalization performance.

Methodology
In this section, we elaborate on our proposed architecture, listwise attention network, tree-based helpfulness regressor, and listwise ranking loss along with its comparison against the pairwise one from the theoretical perspective.The overall architecture is illustrated in Figure 2.

Multimodal Encoding
Our model receives product description T p i , product images I p i , review text T r i,j , and review images I r i,j as input.We perform the encoding procedure for those inputs as follows.Textual Encoding.For both product text T p i and review text T r i,j , we index their sequences of words into the word embeddings and forward to the respective LSTM layer to yield token-wise representations: where t } m t=1 for product and review images, respectively.We then feed those object features into the self-attention module to obtain visual representations as: V r i,j = SelfAttn({e where V p i , V r i,j ∈ R m×d , and d denotes the hidden size.

Coherence Reasoning
We then learn intra-modal, inter-modal, and intraentity coherence among product-review elements.
Intra-modal Coherence.There are two types of intra-modal coherence relations: (1) product textreview text and (2) product image -review image.
Initially, we designate self-attention modules to capture the intra-modal interaction as: Then, intra-modal interaction features are passed to a CNN, then condensed into hidden vectors via pooling layer: where  review text (rt).Similar to the intra-modal coherence, we first perform cross-modal correlation by leveraging the self-attention mechanism: Thereafter, we pool the above features and concatenate the pooled vectors to attain the inter-modal vector: Intra-entity Coherence.Analogous to the intermodal coherence, we also conduct self-attention and pooling computation, but on the (1) product text (pt) -product image (pi) and (2) review text (rt) -review image (ri) as follows: Eventually, the concatenation of the intra-modal, inter-modal, and intra-entity vectors becomes the result of the coherence reasoning phase:

Listwise Attention Network
In our proposed listwise attention network, we encode list-contextualized representations to consider relative relationship among reviews.We achieve this by utilizing self-attention mechanism to relate list-independent product reviews' features {z i,1 , z i,2 , . . ., z i,|R i | } as follows: where R i denotes the review list associated with product p i .

Gradient-boosted Decision Tree for Helpfulness Estimation
In this section, we delineate our gradient-boosted decision tree to predict helpfulness scores that efficaciously partition review features.Tree Structure.We construct a d tree -depth binary decision tree composed of internal nodes N (|N | = 2 dtree−1 − 1) and leaf nodes L (|L| = 2 dtree−1 ).Our overall tree structure is depicted in Figure 2. Score Prediction.Receiving the list-attended vectors {z list i } N i=1 , our decision tree performs soft partitioning through probabilistic routing for those vectors to their target leaf nodes.In such manner, each internal node n calculates the routing decision probability as: where p left n and p right n denote the likelihood of directing the vector to the left sub-tree and right sub-tree, respectively.Thereupon, the probability of reaching leaf node l is formulated as follows: where 1 ln denotes the indicator function of whether leaf node l belongs to the left sub-tree of the internal node n, equivalently for 1 rn , and P(l) the node sequence path to leaf l.For example, in Figure 2, the routing probability to leaf 6 is µ 6 = p right 1 p left 3 p right 6 .For the score inference at leaf node l, we employ a linear layer for calculation as follows: where s l,i,j denotes the helpfulness score generated at leaf node l.Lastly, due to the probabilistic routing approach, the final helpfulness score f i,j is the average of the leaf node scores weighted by the probabilities of reaching the leaves:

Listwise Ranking Objective
Since MRHP task aims to produce helpfulness order for a list of reviews, we propose to follow a listwise approach to compare the predicted helpfulness scores with the groundtruth.Initially, we convert two lists of prediction scores Subsequently, we conduct theoretical derivation and arrive in interesting properties of the listwise computation.
Theoretical Derivation.Our derivation demonstrates that discrimination computation of both listwise and pairwise functions (Liu et al., 2021;Han et al., 2022;Nguyen et al., 2022) satisfy the preconditions in Theorem (1).Lemma 1.Given listwise discrimination function on the total training set as ), where P denotes the product set, then L list is convex and γ list -Lipschitz with respect to f ′ i,j .Lemma 2. Given pairwise discrimination function on the total training set as + , where r + , r − denote two random indices in R i and y i,r + > y i,r − , and α = max L pair is convex and γ pair -Lipschitz with respect to Based upon the above theoretical basis, we investigate the connection between L list and L pair .
Theorem 2. Let L list and L pair are γ list -Lipschitz and γ pair -Lipschitz, respectively.Then, the following inequality holds: Theorem 3. Let 0 ≤ L list ≤ L list and 0 ≤ L pair ≤ L pair .Then, the following inequality holds: We combine Theorem (1), (2), and (3), to achieve the following result.
i=1 .Then, we have the following inequality: where As in Theorem (4), models optimized by listwise function achieve a tighter bound on the generalization error than the ones with the pairwise function, thus upholding better generalization performance.
We provide proofs of all the lemmas and theorems in Appendix A. Indeed, empirical results in Section 4.6 also verify our theorems.
With such foundation, we propose to utilize listwise discrimination as the objective loss function to train our MRHP model: 4 Experiments

Datasets
For evaluation, we conduct experiments on two large-scale MRHP benchmark datasets: Lazada-MRHP and Amazon-MRHP.We present the dataset statistics in Appendix B. Amazon-MRHP (Liu et al., 2021) includes crawled product and review content from Amazon.com, the international e-commerce brand, between 2016 and 2018.All of the product and review texts are expressed in English.Lazada-MRHP (Liu et al., 2021) comprises product information and user-generated reviews from Lazada.com, a popular e-commerce platform in Southeast Asia.Both product and review texts are written in Indonesian.

Implementation Details
For input texts, we leverage pretrained word embeddings with fastText embedding (Bojanowski et al., 2017) and 300-dimensional GloVe word vectors (Pennington et al., 2014) for Lazada-MRHP and Amazon-MRHP datasets, respectively.Each embedded word sequence is passed into an 1-layer LSTM whose hidden dimension is 128.For input images, we extract their ROI features of 2048 dimensions and encode them into 128-dimensional vectors.Our gradient-boosted decision tree score predictor respectively exhibits a depth of 3 and 5 in Lazada-MRHP and Amazon-MRHP datasets, which are determined on the validation performance.We adopt Adam optimizer, whose batch size is 32 and learning rate 1e−3, to train our entire architecture in the end-to-end fashion.

Baselines
We compare our approach with an encyclopedic list of baselines: • BiMPM (Wang et al., 2017): a ranking model that uses 2 BiLSTM layers to encode input sentences.
• EG-CNN (Chen et al., 2018): a RHP baseline which leverages character-level representations and domain discriminator to improve cross-domain RHP performance.
• Conv-KNRM (Dai et al., 2018): a CNNbased system which uses kernel pooling on multi-level n-gram encodings to produce ranking scores.
• PRH-Net (Fan et al., 2019): a RHP baseline that receives product metadata and raw review text as input.
• SSE-Cross (Abavisani et al., 2020): a crossmodal attention-based approach to filter nonsalient elements in both visual and textual input components.

Main Results
Inspired by previous works (Liu et al., 2021;Han et al., 2022;Nguyen et al., 2022), we report Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG@N), where N = 3 and N = 5.We include the performance of baseline models and our approach in Table 1 and 2.
On Amazon dataset, we consistently outperform prior methods of both textual and multimodal settings.Particularly, our architecture improves over Contrastive-MCR on MAP of 15.2 points in Clothing, NDCG@3 of 20.4 points in Electronics, and NDCG@5 of 21.0 points in Home subset.Furthermore, we accomplish a gain in MAP of 2.2 points in Clothing over PRH-Net, NDCG@3 of 16.4 points in Electronics and NDCG@5 of 11.8 points in Home category over Conv-KNRM baseline, where PRH-Net and Conv-KNRM are the best prior text-only baselines.
For Lazada dataset, which is in Indonesian, we outperform Contrastive-MCR with a significant margin of MAP of 10.4 points in Home, NDCG@5 of 11.6 points in Electronics, and NDCG@3 of 12.4 points in Clothing domain.The text-only variant of our model also gains a considerable improvement of 4.7 points of NDCG@5 in Clothing, 5.0 points of MAP in Electronics over PRH-Net, and 1.4 points of NDCG@3 in Home over Conv-KNRM model.
These outcomes demonstrate that our method is able to produce more sensible helpfulness scores to polish the review ranking process, not only being efficacious in English but also generalizing to other language as well.Over and above, it is worth pointing out in Lazada-Electronics, the textual setting of our approach even achieves higher helpfulness

Method
Clothing Electronics Home MAP N@3 N@5 MAP N@3 N@5 MAP N@3 N@5 Table 1: Helpfulness review prediction results on the Amazon-MRHP dataset.
Setting Method Clothing Electronics Home MAP N@3 N@5 MAP N@3 N@5 MAP N@3 N@5 prediction capacity than the state-of-the-art multimodal baseline, i.e. the Contrastive-MCR model.

Ablation Study
To verify the impact of our proposed (1) Gradientboosted decision tree regressor, (2) Listwise ranking loss, and (3) Listwise attention network, we conduct ablation experiments on the Home category of the Amazon and Lazada datasets.GBDT Regressor.In this ablation, we substitute our tree-based score predictor with various FC-NNs score regressor.Specifically, we describe each substitution with a sequence of dimensions in its fully-connected layers, and each hidden layer is furnished with a Tanh activation function.
As shown in Table 3, FCNN-based score regressors considerably hurt the MRHP performance, with a decline of NDCG@3 of 16.7 points, and MAP of 6.9 points in the Amazon and Lazada datasets, respectively.One potential explanation is that without the decision tree predictor, the model lacks the partitioning ability to segregate the features of helpful and non-helpful reviews.Listwise Ranking Loss.As can be observed in Table 3, replacing listwise objective with the pairwise one degrades the MRHP performance substantially, with a drop of NDCG@3 of 11.8 scores in Amazon, and NDCG@5 of 7.3 scores in Lazada dataset.MRHP, respectively.We can attribute the improvement to the advantage of listwise attention, i.e. supplying the MRHP model with the context among product reviews to assist the model into inferring the reviews' ranking positions more precisely.

Case Study
In Figure 1, we present helpfulness prediction results predicted by our proposed MRHP model and Contrastive-MCR (Nguyen et al., 2022), the previous best baseline.While our model is capable of producing helpfulness scores that evidently sepa- rate helpful with unhelpful product reviews, scores generated by Contrastive-MCR do mingle them.Hypothetically, our method could partition product reviews according to their encoded helpfulness features to obtain inherent separation.We provide more detailed analysis of the partitioning capability of our model and individual produced helpfulness scores in Appendix D and E.

Related Work
For real-world applications, existing methods are oriented towards extracting hidden features from input samples (Kim et al., 2006;Krishnamoorthy, 2015;Liu et al., 2017;Chen et al., 2018;Nguyen et al., 2021) et al., 2021).In this work, we seek to address those issues for the MRHP system with our proposed tree-based helpfulness predictor and listwise architectural framework.

Conclusion
In this paper, for the MRHP task, we introduce a novel framework to take advantage of the partitioned structure of product review inputs and the ranking nature of the problem.Regarding the partitioned preference, we propose a gradientboosted decision tree to route review features towards proper helpfulness subtrees managed by decision nodes.For the ranking nature, we propose listwise attention network and listwise training objective to capture review list-contextualized context.Comprehensive analysis provides both theoretical and empirical grounding of our approach in terms of model generalization.Experiments on two largescale MRHP datasets showcase the state-of-the-art performance of our proposed framework.

Limitations
Firstly, from the technical perspective, we have advocated the advantages of our proposed listwise loss for the MRHP task in terms of generalization capacity.Nevertheless, there are other various listwise discrimination functions that may prove beneficial for the MRHP model training, for example NeuralNDCG (Pobrotyn and Białobrzeski, 2021), ListMLE (Xia et al., 2008), etc.Moreover, despite the novelty of our proposed gradient-boosted tree in partitioning product reviews into helpful and unhelpful groups, our method does not employ prior contrastive representation learning, whose objective is also to segregate helpful and unhelpful input reviews.The contrastive technique might discriminate reviews of distinctive helpfulness features to bring further performance gain to multimodal review helpfulness prediction.At the moment, we leave the exploration of different listwise discrimination functions and contrastive learning as our prospective future research direction.
Secondly, our study can be extended to other problems which involve ranking operations.For instance, in recommendation, there is a need to rank the items according to their appropriateness to present to the customers in a rational order.Our gradient-boosted decision tree could divide items into corresponding partitions in order for us to recommend products to the customer from the highly appropriate partition to the less appropriate one.Therefore, we will discover the applicability of our proposed architecture in such promising problem domain in our future work.A Proofs Lemma 1.Given listwise loss on the total training set as ), where P denotes the product set, then L list is convex and γ list -Lipschitz with respect to f ′ i,j .Proof.Taking the second derivative of Equation ( 33), we have proving the convexity of L list .The Lipschitz property of L list can be derived from such property of the logarithm function, which states that where the first inequality stems from log y i,j .Applying the above result for L list , we obtain Multiplying both sides by y i,j , and integrating the summation on all inequalities for i ∈ {1, 2, . . ., |P |} and j ∈ {1, 2, . . ., |R i |}, we achieve Utimately, we obtain: Where γ list = γ.This proves the γ list -Lipschitz property of L list .
Lemma 2. Given pairwise loss on the total training set as r + , r − denote two random indices in R i and y i,r + > y i,r − , and α = max L pair is convex and γ pair -Lipschitz with respect to Employing summation of the inequality on all i ∈ {1, 2, . . ., |P |}, we have 1681 which proves the convexity of L pair .
Regarding the Lipschitz property, we first show that h pair i holds the property: y max , since we take the non-negative values in (41).Thus, Similarly, applying the aforementioned observation, we have: Combining ( 42) and ( 43) leads to: such that γ pair ≥ 1. Adopting the summation of ( 44) on all i ∈ {1, 2, . . ., |P |}, we obtain: (45) The Lipschitz property of L pair follows result (45).
Theorem 2. Let L list and L pair are γ list -Lipschitz and γ pair -Lipschitz, respectively.Then, the following inequality holds: Proof.In order to prove Theorem (2), we first need to find the formulation of γ list and γ pair .We leverage the following lemma: Lemma 3. A function L is γ-Lipschitz, if γ satisfies the following condition (Akbari et al., 2021): With the foundation in mind, we take the derivative of L list i,j and L pair i,j : (48) and (49) imply that Combining equation (50) and Lemma (3), we obtain γ list ≤ γ pair .■ Theorem 3. Let 0 ≤ L list ≤ L list and 0 ≤ L pair ≤ L pair .Then, the following inequality holds:

B Dataset Statistics
In this section, we provide dataset statistics of the Amazon and Lazada datasets on the MRHP task.All of the numerical details are included in Table 5.

C Generalization Errors of the Models trained with Listwise and Pairwise Ranking Losses
In this Appendix, we illustrate the empirical evolution of generalization errors of pairwise-trained and listwise-trained models on the remaining categories of the Amazon-MRHP and Lazada-MRHP datasets.
The discovered characteristics regarding generalization in Figures 5 and 6 agree with those in Section 4.6, corroborating the intensified generalizability of our proposed listwise ranking loss.

D Analysis of Partitioning Function of Gradient-Boosted Decision Tree
We examine the partitioning operation of our proposed gradient-boosted decision tree for the multimodal review helpfulness prediction.In particular, we inspect the µ = [µ 1 , µ 2 , . . ., µ |L| ] probabilities, which route review features to the target leaf nodes in a soft approach.Our procedure is to gather µ at the leaf nodes for all reviews, estimate their mean value with respect to each leaf, then plot the results on Clothing and Home of the Amazon and Lazada datasets, respectively, in Figures 7,8,9,10,and 11.
From the figures, we can observe our proposed gradient-boosted decision tree's behavior of assigning high routing probabilities {µ i } |L| i=1 to different partitions of leaf nodes, with the partitions varying according to the helpfulness scale of the product reviews.In consequence, we can claim that our GBDT divides the product reviews into corresponding partitions to their helpfulness degrees, thus advocating the partitioned preference of the input reviews.

E Examples of Product and Review Samples
We articulate product and review samples in Figure 1, comprising their textual and visual content, with the helpfulness scores generated by Contrastive-MCR (Nguyen et al., 2022), whose score predictor is FCNN-based, and our GBDT-based model.

NN-based Score
Tree-based Score Review 6 -Label: 1 0.044 -0.778I hate going through the hassle of returning things but it had to be done with this purchase.Review 7 -Label: 1 0.684 -0.800The short glasses are nice, but the tall ones break easily.SUPER easily.I had two of them break just by holding them.I will absolutely not be reordering this.Review 8 -Label: 1 0.443 -0.897I love these.We had them in a highly stylized Japanese restaurant and were psyched to find them here.Tall glasses have a "seam".No tipping or breakage yet as mentioned by other reviewers.Tree-based Score Review 1 -Label: 1 0.281 -0.192I really loved this and used it to carry my laptop to and from work.I used the cross-body strap.However, the metal hardware of the strap broke after three months, and the stitching where the cross-body strap attached to the purse ripped off the same week.Love this ourselves but the handles are too short for me to wear comfortably without the cross body strap.Review 2 -Label: 1 2.938 -0.138 Hello, I am Alicia and work as a researcher in the health area.Moreover, I was looking for a feminine, classical and practical bag-briefcase for my work.I would like to begin with the way you show every product.I love when I can see the inner parts and the size of the bag, not only using measures but when you show a model using the product too.Also, the selection of colour is advantageous a big picture with the tone selected.There are many models, sizes and prices.I consider that is a right price for the quality and design of the product.The products I bought have a high-quality appearance, are professional and elegant, like in the pictures!I was not in a hurry, so I was patient, and the product arrived a couple of days before the established date.The package was made thinking in the total protection of every product I bought, using air-plastic bubbles and a hard carton box.Everything was in perfect conditions.I use them for every day-work is very resistant, even in rain time I can carry many things, folders and sheet of paper, a laptop.Their capacity is remarkable.The inner part is very soft and stands the dirty.I am enjoying my bags!All the people say they are gorgeous!Review 3 -Label: 1 0.460 -0.226This purse has come apart little by little within a month of receiving it.First the thread that held on the zipper began to unravel.Then the decorative seam covering began to come off all over the purse.Yesterday I was on my way into the grocery and the handle broke as I was walking.I've only had it a few months.Poorly made.Review 4 -Label: 1 -0.646 -0.067I bought this because of reviews but i am extremely disappointed...This bag leather is too hard and i don't think i will use it Review 5 -Label: 2 5.094 -0.493There are slight scratches on the hardware otherwise great size and it's a gorgeous bag.Got it for use while I'm in a business casual environment.Review 8 -Label: 3 0.259 0.939 This bag is perfect!It doubles as somewhat of a "briefcase" for me, as it fits my IPad, planner, and files, while still accommodating my wallet and normal "purse" items.My only complaint was that Jre scratches already on the gold metal accents when I unwrapped it from the packaging.Otherwise-great deal for the price!Review 9 -Label: 2 2.695 0.462 I believe this the most expensive looking handbag I have ever owned.When your handbag comes in its own bag, you are on to something wonderful.I also purchased a router in the same order, and I'm serious, the handbag was better wrapped and protected.Now for a review : The handbag is stiff, but I expected that from other reviews.The only reason I didn't give a five star rating is because it is not as large as I hoped.A laptop will not fit.Only a tablet.This is a regular good size purse, so don't expect to be able to carry more than usual.I probably won't be able to use it for my intented purpose, but it is so beautiful, I don't mind.Review 10 -Label: 1 -0.235 -0.189 Look is great can fit HP EliteBook 8470p (fairly bulky laptop 15 inch), but very snug.I can only fit my thin portfolio and the laptop into bag.Review 11 -Label: 1 6.290 -0.194This bag is really great for my needs for work, and is cute enough for every day.Other reviews are correct that this is a very stiff-leather bag, but I am fine with that.I love the color and the bag is super adorable.I get so many compliments on this.Also, I travelled recently and this was a perfect bag to use as your "personal item" on the airplane-it zips up so you don't have to worry about things falling out and is just right for under the seat.I love the options of having handles AND the long strap.I carry an Iphone 6+ (does not fit down in the outside pocket completely but I use the middle zipper pouch for my tech), wallet, glasses, sunglasses, small makeup bag, a soapdish sized container that I use for holding charger cords (fits perfect in the inside liner pockets), and on the other side of the zipper pouch I carry an A5-sized Filofax Domino.

NN-based Score
Tree-based Score Review 12 -Label: 3 2.262 0.923 Absolutely stunning and expensive looking for the price.I just came back from shopping for a tote bad at Macy's and so I had the chance to look and feel at all the different bags both high end brand names and generic.This has a very distinguished character to it.A keeper.The size it rather big for an evening out as long as it is not a formal one.I like that it can accommodate a tablet plus all other things we women consider must haves.The silver metal accents are just of enough amount to give it ump but not superfluous to make it look tacky.The faux ostrich material feel so real.The whole bag is very well balance.
Inside it has two zippered pockets and two open pockets for cell phone and sun glasses.Outside it has one zippered pocket by the back.I won't be using the shoulder strap too much as the the handles are long enough to be carried on the shoulders.Review 13 -Label: 4 7.685 1.969I added pictures.I hate the fact that people selling things do not give CLEAR defined pictures.This purse was well shipped.Not one scratch... and I don't think there COULD have been a scratch made in shipping.The handles and the bottom are a shiny patent leather look.The majority of the case is a faux ostrich look.It has a 'structure' to it.Not a floppy purse.There is a center divider that is soft and has a zipper to store things.One side (inside) has two pockets that do not zipper.One side (inside) has a zippered pocket.It comes with a long shoulder strap.Please see my photos.So far I really like this purse.The water bottle is a standard 16.9oz.Review 14 -Label: 2 2.309 0.584 Love this purse!When I opened the package it seemed like it was opening purse I had purchased for $450.00 it was packaged so nicely!! Every little detail of the purse was covered for shipping protection.This was/is extremely impressive to me for a purse I paid less than $40.00 for.Wow.

Figure 1 :
Figure1: Examples of helpfulness scores produced by score regressors built upon neural network and gradientboosted decision tree.We present the content of the product and review samples in Appendix E.

Figure 2 :
Figure 2: Illustration of our Multimodal Review Helpfulness Prediction model.

Theorem 4 .
Consider two models f list D and f pair D under common settings trained to minimize L list and L pair , respectively, on dataset D

Figure 3 :
Figure 3: Generalization error curves per training epoch on the Electronics category in Amazon-MRHP dataset.

Figures 3
Figures 3 and 4 illustrate the approximation of the generalization error Ê(f θ D ) = R val (f θ D ) − R train (f θ D ) of the model after every epoch, where R val and R train indicate the average loss values of the trained model f θ D on the validation and training sets, respectively.Procedurally, due to different scale of the loss values, we normalize them to the range [0, 1].The plots demonstrate that generalization errors of our MRHP model trained with the listwise ranking loss are constantly lower than those obtained by pairwise loss training, thus exhibiting better generalization performance.Additionally, as further shown in Table 4, f θ, list D incurs a smaller training-testing performance discrepancy △ MAP = |MAP training − MAP testing | than f θ, pair D , along with Figures 3 and 4 empirically substantiating our Theorem (4).

Figure 4 :
Figure 4: Generalization error curves per training epoch on the Electronics category in Lazada-MRHP dataset.

Figure 5 :Figure 6 :
Figure 5: Generalization error curves per training epoch on the Clothing category in Amazon-MRHP and Lazada-MRHP datasets.
Mean µ i values on 2-rating reviews of Amazon-Home dataset

Figure 7 :
Figure7: Mean µ i routing probabilities at the proposed GBDT's leaves for 1-rating and 2-rating reviews in Amazon-Home dataset.
Mean µ i values on 4-rating reviews of Amazon-Home dataset

Figure 8 :
Figure 8: Mean µ i routing probabilities at the proposed GBDT's leaves for 3-rating and 4-rating reviews in Amazon-Home dataset.
Figure9: Mean µ i routing probabilities at the proposed GBDT's leaves for 0-rating and 1-rating reviews in Lazada-Clothing dataset.
Mean µ i values on 3-rating reviews of Lazada-Clothing dataset

Figure 10 :
Figure 10: Mean µ i routing probabilities at the proposed GBDT's leaves for 2-rating and 3-rating reviews in Lazada-Clothing dataset.
Mean µ i values on 4-rating reviews of Lazada-Clothing dataset

Figure 11 :
Figure 11: Mean µ i routing probabilities at the proposed GBDT's leaves for 4-rating reviews in Lazada-Clothing dataset.
true that the taller 18-oz glasses are delicate.If you're the kind of person who buys glassware expecting every glass to last 20 years, this set isn't for you.If you're the kind of person who enjoys form over function, I'd highly recommend them.Review 10 -Label: 1 6.074 -0.844 Quality is good.Does not hold water from the underside if you put it in the dishwasher.Review 11 -Label: 1 2.615 -0.923I have owned these glasses for 20-plus years.After breaking most of the tall ones, I looked around for months to find great glasses but still thought these were the best, so I bought more.Review 12 -Label: 3 7.529 0.836 I am sooooooo disappointed in these glasses.They are thin.Of course, right after opening we put in the dishwasher and upon taking them out it looked like they were washed with sand!We could even see the fingerprints.And we have a watersoftener!In the photo I have included, this is after one dishwasher washing!
bag, has no flexibility.stiff.But I do receive a lot of compliments.Review 7 -Label: 1 0.819 -0.284I love this bag!!!I use it every day at work and it has held up to months of use with no sign of wear and tear.It holds my laptop, planner, and notebooks as well as my large wallet and pencil case.It holds so much!I've gotten so many compliments on it.It feels and looks high quality.
It's roomie & has many pockets inside.And med/large purse I'd say, but I like that it's larger in length than height.It's very classic looking yet different with texturing.I always get many compliments on it.Believe me I have Many purses & currently this is one of my favorites!!I have already & will continue to purchase Dasein brand handbags.
×d , l p i and l r i,j denote sequence lengths of the product and review text, respectively, d the hidden dimension.
Visual Encoding.We adapt a pre-trained Faster R-CNN to extract ROI features of m objects {e p i t } m t=1 and {e r i,j

Table 2 :
Helpfulness review prediction results on the Lazada-MRHP dataset.

Table 3 :
Ablation study on the Home category of Amazon-MRHP and Lazada-MRHP datasets.

Table 4 ,
we postulate that removing listwise training objective impairs model generalization, revealed in the degraded MRHP testing performance.Listwise Attention Network (LAN).We proceed to ablate our proposed listwise attention module and re-execute the model training.Results in Table3betray that inserting listwise attention brings about performance upgrade with 16.9 and 9.1 points of MAP in Amazon-MRHP and Lazada-

Table 4 :
Training-testing performance of our model trained with listwise and pairwise ranking losses on the Electronics category of Amazon and Lazada datasets.

Table 5 :
Statistics of MRHP datasets.Max #R/P denotes the maximum number of reviews associated with each product.

Table 7 :
Generated helpfulness scores on reviews 6-12 for product B00005MG3K.Dasein Frame Tote Top Handle Handbags Designer Satchel Leather Briefcase Shoulder Bags Purses