Automatic Marketing Theme and Commodity Construction System for E-commerce

,


Introduction
Nowadays, e-commerce platforms have become one of the primary methods of shopping for people.Customers are accustomed to browsing individual products in the traditional recommendation system, as depicted in Figure 1.However, when customers are interested in products with specific marketing themes, the current recommendation system fails to meet their shopping needs, resulting in high user browsing costs.For instance, a customer who wishes to purchase toddler shoes for their baby may be particularly interested in "Warm Toddler Shoes in Winter."They would prefer to browse warm toddler shoes from different brands and with varying prices on a single page, similar to the right image in Figure 1, rather than sifting through a mixture of individual products in the traditional recommendation system, as shown in the left image of Figure 1.Therefore, it is imperative to develop a system that incorporates marketing themes and their corresponding product collections.This system would not only explore the marketing characteristics of products and present them to customers but also assist customers in quickly finding their desired products, thereby enhancing the user experience and conversion rate.
In the e-commerce platform, a marketing theme is a new form of commodity that comprises a list of commodities sharing the same marketing attribute, along with a concise phrase summarizing the com- mon attributes.As depicted in Figure 1, a primary commodity with the marketing theme (highlighted in the red box) is displayed in the recommendation flow page.If customers are attracted to the theme and the primary commodity, they can click on it and be directed to the marketing theme page, where a collection of commodities belonging to the marketing theme will be showcased.Currently, the system relies on experts to manually create marketing themes and select relevant commodities based on their understanding of past experiences with commodities.They have to choose suitable commodities from a vast pool of millions and come up with an appropriate theme.However, this approach is not only costly and inefficient but also leads to low online indicators for the recommendation system.Hence, there is a growing demand for e-commerce platforms to explore automatic generation methods for constructing marketing themes and commodity systems.
In this study, we present an e-commerce Marketing Theme and Commodity Construction system, known as MTCC, which automatically generates marketing themes and selects relevant commodities.The task is structured as a four-step problem: marketing theme generation, selection of related commodities, simulation of indicators, and rewriting of low-quality themes.
Our contributions are summarized as follows: • A proposed marketing theme generation model is introduced in the e-commerce do-main with the objective of generating versatile marketing themes for popular products and uncovering the trending and influential aspects of marketing.
• A theme-commodity consistency module based classification method is designed to select commodities with the same marketing theme.This module can also be used in other fields, such as search engine.
• An indicator simulation module is proposed to evaluate the quality of generated themes offline and ensure better online revenue before online publishing.
• A theme rewrite module is proposed to rewrite themes with poor evaluation, which is conducive to further improving the online effectiveness.This module can also serve the personalized recommendation system, which construct the theme based on the commodities collection from the user's personalized recommendation.
• We have presented the comprehensive architecture of the deployed system, in which MTCC has been successfully implemented on a real-world e-commerce platform.To the best of our knowledge, this work represents a pioneering endeavor in automatically constructing marketing themes within the ecommerce platform.

Proposed Method
In this section, we provide the details of our Marketing Theme and Commodity Construction system(MTCC) for E-commerce, as shown in Figure 2. MTCC consists of four main parts: a marketing theme generation module, a related commodities selection module, an indicator simulation module, and a low-quality theme rewriting module.Firstly, the pre-trained language model encodes the information of the commodities, such as their titles and attributes, to generate marketing themes that describe the marketing aspect of these commodities.To obtain attractive themes from the generation process, various strategies are employed to rank the generated themes, such as utilizing popular keywords from the e-commerce commodity search engine.Secondly, a theme-commodity consistency module is utilized to select commodities that are related to the generated attractive themes.
To ensure more accurate consistency, our model is further enhanced with a pretrained search engine model, which can resolve attribute conflicts, such as those related to color and size.Next, an online indicator simulator, employing a pairwise loss function, is used to predict the effect of online indicators as a measure of theme quality for the generated themes.A low predicting score indicates that the generated theme may have a poor online effect and requires rewriting.Finally, another pretrained language model encodes the information of the selected commodities mentioned above, generating candidate themes for the theme-commodity consistency module.This cycle continues until a high-quality marketing theme is produced.To ensure the quality of our system, we also involve manual screening of the generated themes and selected commodities.

Theme Generation
In this work, we propose generating multiple marketing theme titles for each commodity category based on popular product information.For example, products like "high calcium milk" and "low fat milk" both belong to the commodity category "milk".Our task aims to generate marketing theme titles such as "pure low-fat milk for fitness enthusiasts" and "high calcium milk for children" for the "milk" category.
Specifically, given commodity category name  ∈ , we select a group of hot commodities   = { 1 ,  2 , ...,   }, where all commodities in   has the same commodity category name .For each commodity   , is the commodity information of   including commodity title T and attributes A, and is the marketing theme title of   .For each commodity   , we aim to learn the model parameters  1 and estimate the conditional probability: where   is the  ℎ word in   and  <  represents all tokens before position (i.e.  = ( 1 ,  2 , ...,   −1 )).After that, in inference, for each category  and its commodities set   , we can generate some candidate marketing theme titles   =  1 ,  2 , • • • ,   with our theme generation module.
To select more popular theme, we utilize the hot words from search engine in e-commerce.Given the candidate titles   , we calculate the hot score of   as: where  ℎ (•) is the exposure times of • in search engine log,   (•) is the exposure times of • in click box log,  (  ,   ) is the number of   in the titles of commodities   , and  (  , ) is the number of   in the titles of all the commodities .
Finally, we select the top  titles   = ℎ 1 , ℎ 2 , • • • , ℎ  based on the hot scores  ℎ for category  to the next step, where ℎ  ∈   .

Commodity Selection
Given one hot title   of commodity category name  and its whole commodities set   , we utilize the BERT classification model to select the commodities set    =  1 ,  2 , • • • ,   related to the theme   .Specifically, given one commodity information   and the theme where   is the  ℎ word in   , we concatenate the   and   with the separate tag <  > as input to the BERT encoder, and utilize the <  > embedding to obtain the consistency score.The loss function of commodity selection module is defined as: where   is the negative example which are randomly sampled from the marketing theme set  ,  1 ∈ R  ,  1 ∈ R and  2 are the model parameters.
To obtain more accurate consistency, we also enhance our model with the pretrained search engine model, which can solve some attribute conflicts, such as color and size.We input the marketing theme   as query and calculate the ranking score  ℎ (  ,   ) of commodity   based on the pretrained search engine model of E-commerce.The enhancement consistent score is defined as: (4) where  4 is the parameter.Finally, we select the top  commodities    =  1 ,  2 , • • • ,   based on the enhancement consistent scores  ℎ (  ,   ) for the hot title   to the next step, where   ∈   .

Indicator Simulator
Given the commodity category name  and its corresponding themes  = { 1 ,  2 , • • • ,  | | }, we construct positive and negative examples based on the online indicator, specifically the click-through rate (CTR).For instance, if the CTR of   is significantly higher than that of   , then   is considered as the positive example, and   is considered as the negative example.
We utilize the BERT classification with a pairwise loss for optimization: where   is the negative example from the marketing theme set  ,  2 ∈ R  ,  2 ∈ R and  3 are the model parameters.
In inference, we input the generating theme ℎ  to the above BERT classification to obtain its score  ℎ  .When the score  ℎ  exceeds the threshold, the theme ℎ  consider as high-quality theme.Otherwise, it considers as low-quality theme and should be input to the next theme rewriter module.

Theme Rewriter
In this phase, we need to use the unilm model to rewrite low-quality themes.Given a theme   and its related commodity set    =  1 ,  2 , • • • ,   , we first calculate the consistent score  ℎ (  ,   ) between these commodities and the theme.Then, according to this consistent score, from high to low, we gradually input the commodity information  1 , • • • ,   into the unilm generation model for the marketing theme   .For example, the theme "High calcium milk" has three related commodities P1, P2 and P3, and their consistent scores decrease from high to low.First, we input the commodity information  1 of P1 into unilm, and take the theme   as the output.Then we concatenate the commodity information of P1 and P2 as  1 ;  2 and input them into unilm, and take theme   as the output.Finally, we concatenate the commodity information of P1, P2 and P3 into unilm as  1 ;  2 ;  3 , and take theme   as the output.The optimization goal is defined as: where   ,  <=  is the  ℎ related commodity in    ,   is the  ℎ word in   and  <  represents all tokens before position (i.e. <  = ( 1 ,  2 , ...,   −1 )).

Deployment
We have successfully deployed the proposed MTCC system on a real e-commerce platform.Figure 3 shows the workflow of the deployed system updated weekly.Since there are thousands of commodities online, we update the model and rebuild the system every week to find novel marketing themes.In order to ensure the user experience, the marketing theme must be manually filtered before being published online.

Dataset and Settings Dataset
The data for theme generation model are obtained from human-created and reviewed commodity-theme pairs, collected from a publicly available online e-commerce platform, JD.com * .The commodity information contains the commodity title, category name c and attributes.
Fine-tuning pre-trained models with the domain data has been proved which can obtain better domain performance.In the E-commerce scenario, by fine-tuning with the E-commerce data, the pretraining model can better understand the commodities and summarize the attribute information of the commodity, so as to produce more appropriate marketing themes.Specifically, we use UniLM-BERT as the backbone structure, and utilize the commodity title, short title, and attributes information as input.We random mask the input information and recover the masked token as the sub-task.In addition, we also design the consistency classification task: given a commodity title-attribute pair, this task aims to classify whether the pair refer to the same commodity or not.For positive examples, attributes and titles describe the same commodity, and attributes are randomly concatenated as a sequence to introduce disorder noise.For negative samples, we randomly select attributes from different commodities.From this, we obtained our E-commerce UniLM model.

Results
From the results, we can see that the performance of well-designed pre-trained language model, such as T5 and UniLM-BERT, is relatively better than BART,with the introduction of e-commerce finetune, our E-commerce UniLM model outperforms all baseline models.In the E-commerce scenario, by fine-tuning with the E-commerce data, the pretraining model can better understand the commodi-* https://www.jd.com/ ties and summarize the attribute information of the commodity, so as to produce more appropriate marketing themes.In summary, our E-commerce UniLM model has the ability to generate a more smooth and relevant theme than baselines.

Dataset and Settings
Human-written Short Title Dataset In the ecommerce website, some commodities have corresponding short titles that are written by humans.Since the consistency between short titles and commodities is similar to that between themes and commodities, we can utilize this data to train our commodity selection module for classifying the consistency of commodities and themes.We collected 3 million human-written short title-commodity pairs from JD.com as positive examples.Additionally, we randomly selected 10 short titles of other commodities in the same category name  as negative examples.The data were randomly split into a training set, a validation set, and a test set in a ratio of 8:1:1.
Baselines We compare different text classification models, including BERT, RoBERTa and our E-commerce UniLM.To validate the effectiveness of our model, we also compare with the pretrained search engine of E-commerce as SEARCH and the enhancement model as MTCC.

Results
From the results, we can see that our fine-tuned UniLM model can capture more features from the E-commerce data.The performance of search engine is better than baselines, because the pretrained search model has more training data from the online app and has the ability to recognize the character of commodities.With the enhancement of search engine, our MTCC model outperforms all baseline models.In summary, our MTCC model has the ability to select a more consistent commodity than baselines.

Dataset and Settings
Dataset To simulate the online indicator of marketing theme, we mine the marketing theme online log which records the real user browsing and clicking data of the marketing theme.For the same category name, when the two themes is fully exposed, and the ctr of theme Y1 is larger then it of   Baselines We compare different text classification models, including BERT, RoBERTa and our E-commerce UniLM.

Results
From the indicator simulator results, we can see that the performance of pre-trained E-commerce UniLM model is relatively better than BERT and RoBERTa, which demonstrates that our fine-tuned UniLM model can capture more features from the E-commerce data.In summary, our E-commerce UniLM model has the ability to predict a more accurate CTR for the marketing theme.

Dataset and Settings
Dataset The data for rewriting generation model training are the same as the theme generation module.We splice the dataset and conduct the training pairs.The training, validation, and testing sets contain 240w, 30w, and 30w pairs, respectively.

Results
From the theme rewriter results, we can see that our E-commerce UniLM model outperforms all baseline models.In summary, our E-commerce UniLM model has the ability to generate a more smooth and relevant theme than baselines.

Online A/B Test
To demonstrate the effectiveness of the MTCC system in real-world applications, we conducted standard A/B tests to evaluate the benefits of deploying marketing themes on e-commerce mobile applications.The CTR of AI-generated marketing themes increased by 11%, as compared to human-produced marketing themes, which shows the value of AI marketing themes.What's more, the production efficiency of AI production can reach 10 times that of manual production, which greatly saves the cost of manual production and improves the system efficiency.

Conclusion
In this work, we propose a new marketing theme and commodity construction system, named

Limitations
The marketing topic generation and related product matching system (MTCC) we built can automatically and efficiently generate information-rich marketing topics and circle products under specific topics.Quickly gather and explore users' own interests, satisfaction, push information sending and shopping decision-making efficiency in the user's use path by intelligently generating novel personalized themes.At present, our system combines user interests for instant recommendation of topics of interest, but users' shopping information is not included in the production process; therefore, we can incorporate users' shopping information into the topic production and product circle selection process to generate more user-personalized marketing themes, thereby deepening the personalized experience and assisting users in purchasing decisions.The ability to organize products and materials based on "user shopping interests" more effectively helps users freely identify, search, and customize their interests.

Figure 1 :
Figure 1: A sample of marketing theme.

Table 1 :
The results of different theme generation model.

Table 2 :
The results of commodity selection model.

Table 3 :
The results of indicator simulator model.

Table 4 :
The results of different theme rewriter model.MTCC, which automatically generates marketing themes and collects commodities under this specific marketing themes.It greatly saves users' shopping costs and improves the user click-through-rate.In order to achieve better online results, MTCC includes four modules, namely theme generation, commodity consistency, indicator simulator, and theme rewriting.Extensive offline experiments and online A/B tests prove that MTCC can not only generate popular marketing themes and automatically select related items, but also improve the online effectiveness in recommender systems.In future, we can incorporate user shopping preference information into the production process to generate personalized marketing themes.