OpinionConv: Conversational Product Search with Grounded Opinions

When searching for products, the opinions of others play an important role in making informed decisions. Subjective experiences about a product can be a valuable source of information. This is also true in sales conversations, where a customer and a sales assistant exchange facts and opinions about products. However, training an AI for such conversations is complicated by the fact that language models do not possess authentic opinions for their lack of real-world experience. We address this problem by leveraging product reviews as a rich source of product opinions to ground conversational AI in true subjective narratives. With OpinionConv, we develop the first conversational AI for simulating sales conversations. To validate the generated conversations, we conduct several user studies showing that the generated opinions are perceived as realistic. Our assessors also confirm the importance of opinions as an informative basis for decision making.


Introduction
In order to elucidate the mechanics of conversational product search, Kotler and Keller (2015) delineated a five-stage process that encapsulates customer decision making (see Figure 1, left).This process suggests that the customer: (1) recognizes a problem or need; (2) searches for information about potential products or services that could resolve the problem or fulfill the need, filtering them until a manageable set of alternatives remains; (3) evaluates and compares these alternatives against each other with regard to personal preferences and third party opinions to inform their decision making; (4) proceeds to make a purchase decision predicated upon this informed evaluation; and finally, (5) exhibits post-decision behaviors that reflect their satisfaction, which completes the process.
Typically, in-store shopping predominantly engages with the second and third stages of this customer decision process.Both the activities of reduc-ing the number of alternatives and evaluating their merits and demerits are conducted in conversations between customers and sales assistants.The absence of such interactions in online environments is perceived as a deficiency in customer service especially with respect to the third stage (Exalto et al., 2018).Customers derive post-purchase satisfaction from personal exchanges, relating to others experience, and having the opportunity to ask questions (Papenmeier et al., 2022).The considerable number of online product reviews available are not a substitute for everyone, since many customers lack the patience to examine many of them, leading to post-purchase dissatisfaction and product returns.Conversational AI has been suggested as a solution (Gnewuch et al., 2017), with the goal of mimicking the conversational strategies of sales assistants (Papenmeier et al., 2022).But despite its importance, previous research on conversational product search almost entirely neglects the third stage, or rather its opinionated aspects (Section 2).
Recent advances in large-scale conversational language models, spearheaded by OpenAI's Chat-GPT, are driving a paradigm shift in the development of conversational technologies.Nonetheless, when it comes to expressing opinions pertaining to real-world events or entities, these language models lack the necessary grounding in tangible reality.For an individual to formulate an opinion on a particular subject matter, they require exposure to the subject to relate the new experience to past ones, and importantly, an emotional perception.A language model is only capable of generating what might be termed as a "statistical average" of thirdparty opinions, if they have been part of its training data.In the context of product search, such opinions would be deemed unauthentic as they are not based on real-world experiences or substantiated knowledge.This lack of authenticity poses challenges to the effective utilization of these models when (personal) opinions play an important role.

Purchase decision
Customer decision process (Kotler and Keller, 2015) Problem/need recognition

Negotiation strategy
(Conversation template ID: 4) Sequence of questions and answers between the sales assistant S and the customer C about product features, until a small set of alternatives remains.

Search dialog
S makes an offer.
C asks for cheaper options, S makes a cheaper offer.
C asks about a product feature.S responds with a positive opinion about it.
C asks about an alternative due to a positive opinion on one of its features.S responds with a negative opinion on a different feature.
C voices a negative opinion about a product feature (price).S disagrees.In this paper, we focus on the third stage of the customer decision process, for which we contribute the first approach to generate grounded opinionated statements (Section 3).We conceive and operationalize the generation of grounded opinions by positing that a grounded opinion about a product is an opinion which has been verifiably expressed by a minimum of one individual in a product review that specifically discusses the product under scrutiny.Our approach, OpinionConv, combines a productspecific index of reviews for a cohort of products of the same kind with a mechanism to generate realistic opinionated conversational exchanges.While carefully tuned, our approach must still be considered an early prototype.Consequently, before asking real customers to use it, its fundamental capabilities must first be established.We therefore simulate in-store dialogues between a customer and a sales assistant, where both parties incorporate grounded opinions.These conversations are then systematically evaluated in an experimental setup that ascertains the perception of human readers regarding the realism of these dialogs (Section 4). 1 1 Code and data: https://github.com/caisa-lab/OpinionConv

Related Work
Three lines of research are related to ours: opinionated question answering, conversational product search, and review-based conversation generation.

Opinionated Question Answering
While factoid Question Answering (QA) systems have a long tradition and some even outperform humans, non-factoid questions, such as opinions, explanations, or descriptions, are still an open problem (Cortes et al., 2021).Cardie et al. (2003) employed opinion summarization to help multiperspective QA systems identify the opinionated answer to a given question.Yu and Hatzivassiloglou (2003) separated opinions from facts and summarized them as answers.The linguistic features of opinion questions have also been studied (Pustejovsky and Wiebe, 2005;Stoyanov et al., 2005).Kim and Hovy (2005) identified opinion leaders, which are a key component in retrieving the correct answers to opinion questions.Ashok et al. (2020) introduced a clustering approach to answer questions about products by accessing product reviews.Rozen et al. (2021) examined the task of answering subjective and opinion questions when no (or few) reviews exist.Jiang et al. (2010) proposed an opinion-based QA framework that uses manual question-answer opinion patterns.
Closer to our work, Moghaddam and Ester (2011) address the task of answering opinion questions about products by retrieving authors' sentiment-based opinions about a given target from online reviews.McAuley and Yang (2016) address subjective queries using relevance ranking, and Wan and McAuley (2016) extends this work by considering questions that have multiple divergent answers, incorporating aspects of personalization and ambiguity.AmazonQA (Gupta et al., 2019) is one of the largest review-based QA datasets.Its authors show that it can be used to learn relevance in the sense that relevant opinions are those for which an accurate predictor can be trained to select the correct answer to a question as a function of opinion.SubjQA (Bjerva et al., 2020) includes subjective comments on product reviews.

Conversational Product Search
Information is often gathered through conversations with a series of questions and answers.Conversational Question Answering (CQA) systems engage in such multi-turn conversations to satisfy a user's information need (Zaib et al., 2021).Despite the attention this task has received in e-commerce (Ricci et al., 2011;Bi et al., 2019;Zhang et al., 2018), building a successful conversational product search system for online shopping still suffers from the lack of realistic dialog datasets for model training (Xiao et al., 2021).

Review-based conversation generation
Recently, multi-turn QA has grown more prominent (Cambazoglu et al., 2020).Product reviews are one of the sources of information that are being used for conversational product search.Penha et al. (2022) generate review-based explanations for voice-driven product search.Zhang et al. (2018) builds a dataset to answer conversational questions, as illustrated in Figure 1 (Stage 2 "Information Search").They extract feature-value pairs from reviews and convert each review into a conversation based on the mentioned pairs, but omit opinionated statements.Xu et al. (2019) explores the possibility of turning reviews into a knowledge source to answer questions.Feature-related, non-opinionated statements in reviews are flagged and appropriate questions are formulated.

Grounded Product Opinion Generation
This section introduces the OpinionConv construction pipeline to generate grounded opinionated conversations for product search based on product reviews.Figure 2 gives an overview of the pipeline's individual steps, grouped into preprocessing, information search dialog generation (Stage 2 of the customer decision process, which we reproduce from Zhang et al. (2018)), and evaluation dialog generation (Stage 3, our focus).

Data Source and Preprocessing
As a basis for grounded opinions, we utilize a crawl of Amazon product data including their reviews created by Ni et al. (2019). 2 The metadata enclosed includes product descriptions, multi-level product categories, and product information.For our proofof-concept, we focus on one of its 24 product categories, Cell Phones and Accessories.As a first cleansing step, we reviewed the product data and added missing product details.We found the reviews to be of varying writing quality, especially with respect to basic syntax conventions, like the use of punctuation.We employed the model of Alam et al. (2020) to restore the punctuation, which enabled a more reliable sentence extraction and thus benefited the subsequently applied models, which were largely trained on "cleaner" data.
To extract the product features discussed in the reviews, we use the extraction model of Karimi et al. (2021).It is based on a hierarchical aggregation approach and was trained on the laptop review dataset of SemEval 2014 (Pontiki et al., 2014), performing best at that time.Given a review sentence, the model extracts feature terms on which an opinion has been expressed.On sentences containing such opinion statements, we then applied the sentiment analysis model of Zeng et al. (2019), which is based on self-attention to capture local context and global context features to determine the polarity score of the opinion.

Information Search Dialog Generation
To generate the information search dialog of the customer decision process (see Figure 1), we reproduce the approach of Zhang et al. (2018).Reproducing the original dialogs turned out to be straightforward, and we verified our success by direct comparison to the data supplied with the original paper.The dialogs are structured as follows: The sales  assistant asks for preferences on product features, and the customer answers, narrowing down the set of alternatives.The resulting set of alternatives is fed to the next stage of the customer decision process, the evaluation of alternatives.

Evaluation Dialog Generation
The generation of an opinion-based evaluation of alternatives dialog is divided into two steps, the generation of pairs of talking points based on reviews of the alternative products, and their combination into a multi-turn conversation as exemplified in Figure 1.For lack of public corpora of in-store conversations, we resort to a template-based approach.
The templates are derived from common conversational negotiation strategies from the literature.

Dialog Turn Pair Generation
In each turn of an evaluation dialog the customer and the sales assistant discuss the relative value of product features, as well as their benefits and shortcomings compared to alternative products.Sales assistants, by training and/or experience, are usually well-equipped to provide customers with satisfying answers to their questions as well as to respond to opinions that customers express throughout the conversation.
The most salient open question in this respect is: How should a sales assistant react to a customer's opinion in the context of a negotiation?Negotiations combine features of claiming and creating value.Each requires unique strategies and tactics for a negotiator to effectively achieve their objectives while creating the greatest value possible for all parties (Thompson and Hastie, 1990).We take inspiration from three negotiation tactics (Dwyer et al., 1987;Sȃvescu, 2019)

Review 1 Review 2
Customer C voices negative opinion on a product feature, sales assistant S counters with a positive one.

Partial dialog
Figure 3: Example of a basic opinionated dialog pair generation step: Given a product feature like "battery", opinionated statements are extracted from reviews of a given product to form part of a dialog between Customer C and Sales Assistant S.
The parties desire the exact same outcome, so that there is no need for any trade-off.
For instance, as illustrated in Figure 3, where Customer C's remark about a product feature (left) is countered by an opinionated counterargument from Sales assistant S (right), we use the Deny-Disagreement tactic, where the customer expresses a negative opinion on a feature of a product in question, whereas the sales assistant disagrees and counters with a positive opinion on the same feature.This tactic may either correspond to a win-lose or a win-win situation, dependent on whose opinion applies more to the customer: If the customer is correct, they lose against the sales assistant, since the product is not switched.If the sales assistant is correct, they both win, since the customer still gets what they wanted, and the sales assistant may still get to sell the product in question.
A key constraint that we enforce by generating grounded opinions (i.e., opinions rooted in a real product reviews as exemplified in Figure 3) is that neither the customer nor the sales assistant can "lie" to each other, as their opinions are backed by a real person's opinion about the product and its Customer asks about the sales assistant's view on a feature of a product.Sales assistant expresses positive view on it.

Deny-Disagreement
Opinion: P-1, F-A, negative Opinion: P-1, F-A, positive Customer expresses negative opinion on a feature of a product.Sales assistant disagrees and expresses positive opinion on it.
Deny-Switch Product Opinion: P-1, F-A, negative Opinion: P-2, F-A, positive Customer expresses negative opinion on a feature of a product.Sales assistant switches the product and expresses positive opinion on the same feature wrt.new product.

Deny-Switch Feature
Opinion: P-1, F-A, negative Opinion: P-1, F-B, positive Customer expresses negative opinion on a feature of a product.Sales assistant disagrees and expresses positive opinion on a different feature of the same product.

Search-Agreement
Opinion: P-1, F-A, positive Opinion: P-1, F-A, positive Customer expresses positive opinion on a feature of a product.Sales assistant agrees and expresses another positive opinion it.
Search-Switch Feature Opinion: P-1, F-A, positive Opinion: P-1, F-B, positive Customer expresses positive opinion on a feature of a product.Sales assistant agrees and expresses positive opinion on different features of the same product.

Search-Warning
Opinion: P-1, F-A, positive Opinion: P-1, F-B, negative Customer expresses positive opinion on a feature of a product.Sales assistant warns the user and expresses negative opinion on different features of the same product.
feature.Thereby the dialog turns are more realistic, despite both parties being simulated.Moreover, our dialog turns enforce a conversational concept flow between (Li et al., 2023), as the product features and their attributes as key concepts are connected.
In Table 1, we list the templates for dialog pairs according to different negotiation tactics derived from the literature; seven patterns are devised, one question-answer pair, and six opinionopinion pairs.The customer's utterance consists of a feature-specific opinion with either positive or negative opinion for a certain product and one of its features, extracted from one of its reviews.The sales assistant's utterance is a response that expresses either a positive or a negative opinion, not necessarily to the same product or feature.
As can be seen, depending on the type of dialog pair, different negotiation tactis may apply.The sales assistant is allowed to switch the polarity, feature under discussion, and product under negotiation.The mapping of a dialog pair to the negotiation tactics thus depends on the factuality of either opinion expressed by the customer or sales assistant, in case a product switch on the price changes (the price of the new product may be lower, similar, or higher), but also on whether the customer gets what they want.For instance, a switch to a pricier product is certainly worthwhile for the sales assistant, as long as the customer ends up with a desired feature.However, given the necessity of interpreting each generated dialog with respect to its factuality, a mapping between dialog pair and negotiation tactic must be decided on a case-by-case basis, which is beyond the scope of this paper.
A constraint in cases where the product is being changed includes that the sales assistant is allowed to use only two types of products: (1) Retrieved products: The products from the set of alternatives retrieved in the information search dialog at the outset of a conversation based on customer preferences; (2) also viewed products: The products listed in the metadata that have been viewed by other customers.Customers thus can go beyond their original preferences and the set of alternatives.

Template-based Alternative Evaluation
The last step of our pipeline generates conversations composed of multiple dialog pairs, based on a negotiation strategy.Considering the diversity of real dialogs and the fact that a coherent conversation should have a smooth transition between turns (Li et al., 2023), we define a diverse set of conversation templates inspired by past studies on negotiation in behavioral economics (Pruitt, 1981;Fisher and Ury, 1981;Thompson et al., 2010), including both high-level (e.g., insisting on your position: Disagreement) and low-level (e.g., focus on interests: Reaction) dialog acts.We follow Zhou et al. (2019) and devise 14 conversation templates with different combinations of the generated question-answer and opinionopinion pairs.Table 2 exemplifies one of them.We adapt the "CraigslistBargain" setting of He et al. (2018), where a buyer and a seller negotiate the price of a product.But unlike in their work, the sales assistant and the customer negotiate not only the price but primarily the relative merits of product features, whereas price may only be one of them.

Evaluation
A volunteer who is asked to pose as a customer in a laboratory user study, and who has no real intention of investing a fairly large amount of his or her own money in the purchase of a product, does not usually have the same information needs as a real customer.At the same time, we consider it unethical to confront real customers with an early prototype of a conversational sales assistant.Before a practical assistant can be developed, the basic means of generating informed opinions must first be established.
In our evaluation, we therefore decided to simulate full conversations between a hypothetical customer and a hypothetical sales assistant as described in the previous section.We then designed two user studies in which we specifically investigated whether human subjects consider these conversations realistic.
Study 1 investigates whether subjective narratives in conversational product search are considered important compared to purely factual exchanges.Study 2 investiages individuals' perceptions of the quality and realism of the conversations generated by OpinionConv.We conducted the two studies by recruiting volunteers living in the US or UK on Prolific. 3Table 3 shows the total sample size and key demographic information for both studies.As can be seen, we had fewer male participants than females and more than 60% of the participants are between 25 and 45 years old.Key to both our study design is that participants initially believed that the conversations are genuine transcripts of real sales negotiations recorded in a store, instead of generated ones.At the end of the questionnaire, it was revealed that they are not.

Study 1: Importance of Product Opinions
The first study started with the following instruction: "Below is an automatically generated transcript of a sales conversation.We show two variants: Variant 1 is focused on the customer's preferences and requirements.Variant 2 starts similarly, but then continues with an opinionated discussion."After reading both variants of the same dialog participants were asked "Which of the two variants would you as a customer hold with the sales as-sistant while searching for a smartphone?"The survey concluded by asking participants an openended question to explain their judgment for the previous question, which also allowed to ascertain that they had actually read the conversations.
As a result, we find that 83% of the 100 participants of Study 1 prefer Variant 2 over the Variant 1, which confirms that they tend to prefer opinionated conversations when searching and evaluating a product rather than exclusively factual ones.

Study 2: Perceptions of Dialog Realism
For this study, the questionnaire consisted of two separate parts.In the first part, we again let participants believe they are reading a transcript of a real conversation by instructing them as follows: "Suppose you are in an electronics store.While browsing, you happen to overhear part of a conversation between a customer and a sales assistant.Both exchange opinions about the features of one or more products."They are then asked to answer three questions using the 4-point Likert scale: (1) Definitely yes, (2) Rather yes, (3) Rather not and (4) Definitely not.As depicted in Figure 4, bottom, we ask questions for the following goals: Customer understanding, Sales assistant answer sufficiency and Reasonableness of exchange.To investigate whether participants' perceptions change significantly after they learned that the conversation was generated, at the beginning of the second part, we reveal the truth and declare that the conversation they just read, was not a real but an automatically generated one.After the disclosure, they were asked answer Questions 4 to 6 using the same Likert-scaled responses as shown in Figure 4 in order to observe any changes of opinion.For each of our 14 conversation templates, we generated ten examples, and for each example, three participants were asked to answer the questions, a total of 140 questionnaires answered by 420 participants.
Figure 4 shows the distribution of participants' responses to each question, outlining the alterations in perception after revealing the automated generation of conversations.Both user (Q1 & Q4) and agent (Q2 & Q5) utterances, as well as the overarching dialog (Q3 & Q6), were subjected to quality evaluation.The data reveals that prior to revealing the truth, over 66% of evaluators deemed the conversation reasonable (both "yes" answers combined), marginally reducing to 64% post-revelation (Q3 & Q6).Regarding participants' assessment of the customer's understanding of the sales assistant (Q1 & Q4), over 78% affirmed it, which reduces to 77% after the disclosure, albeit almost half of participants switched form definitely yes to rather yes.Regarding the evaluation of the sales assistant's response quality to customer inquiries (Q2 & Q5), over 60% of participants agreed that the responses were sufficient, while the disclosure incited a reduction in both definitely yes and rather yes responses to over 54%.Altogether, the responses indicate a generally positive reception of conversations generated by OpinionConv, with variation in assessment among different conversation templates.While a rough two thirds of participants agree with this outcome, more than half of participants improve their rating from definitely not to rather not considering overall reasonableness.

Participants' Comments
In both studies, participants were asked to explain their judgment in about two sentences, in case of Study 2 once in each part of the questionnaire.With respect to assessing the conversation realism, we mostly observe positive comments.For instance, before disclosing the nature of the conversation one participants commented "The customer was recommended the phone by a friend, and the sales assistant was able to give further information on the phone.Likewise the sales assistant was able to inform the customer on a drawback associated with the phone.",and after "The conversation appeared real as there was flow -i.e. the sales assistant was able to connect with what the customer said and elaborate upon it.Likewise the sales assistant was able to pick up on key details associated with the phone like the camera and OS." However, we also observe three key concerns raised: (1) Some features are of no interest to be discussed, e.g., "Why would the person asks the sales assistant about colors?That seems out of the ordinary."(2) Some participants judge the conversations based on their personal experience with real sales assistants, e.g., "As always in marketing strategies, he [the sales assistant] was just trying to sell a phone not what he [the customer] wanted."(3) A stronger argumentation is expected by some, e.g., "While the sales assistant did respond in a way that does answer the customer's questions, their responses are not so direct and detailed as to be helpful towards the customer.For example, for the question about the screen, stating that it's 'bright  and good quality' would not be convincing enough for me to want to buy the product."Reading the participants' comments and observing the results of crowd-sourced qualitative evaluations have suggested several new research directions for future work relating to common sense product knowledge and argument generation.

Conclusion
We introduce the OpinionConv, a new conversation generation pipeline that generates opinionated multi-turn conversations for product search.OpinionConv was mainly designed to incorporate subjective narratives into conversational product search.The pipeline presented in this work can be easily extended to different domains.Recent progress in conversational systems, such as Chat-GPT and YouChat, have shown tremendous improvements in natural language dialog between humans and conversational agents.However, when it comes to holding an opinionated conversation, specifically in product search, they are still limited for lack of grounding in real-world experience about products.This motivated the design of a pipeline to control both the dialog coherence and the information to be mentioned in the utterances.However, it should be mentioned that the trade-off between a coherent conversation and a more diverse conversation needs to be further studied.In order to validate the quality of the conversations generated by OpinionConv, we conduct two extensive human evaluations.Our results confirm the conversational plausibility of the generated dialogs and reveal that people tend to exchange their personal opinions while searching for a product.
In future work, we envision customer-oriented assistant for buying products that assist customers in discussing the merits of products with a sales assistant, grounded in real-world reviews.

Limitations
As mentioned in Section 3.1, we focused on the Cell Phones and Accessories category of products.However, there is no inherent limitation of our design that prevents future work from including conversations related to other product categories.
Furthermore, an opinion is an observation or a belief that does not need to have evidence to support itself, whereas an argument requires premises.As we discussed in the Section 4.3, study participants expected to have stronger arguments in the generated conversation, rather than only expressing opinions.Therefore, future work should address this aspect utilizing argument mining techniques for generating argumentative dialogues.

Figure 1 :
Figure 1: A grounded opinionated conversation generated by OpinionConv based on Conversation Template 4.

Figure 2 :
Figure2: High-level overview of our approach in OpinionConv for generating opinionated multi-turn conversations.

Figure 4 :
Figure4: Evaluation results for Study 2; Questions Q1, Q2, and Q3 are asked in the first part of the questionnaire (before disclosing the conversations are generated), Q4, Q5, and Q6 are asked in the second part (after disclosure).
: (1) Distributive negotiation: This is a competitive win-lose situation.Any value claimed by one party is at the expense of the other.(2) Integrative negotiation: The parties create or generate value during the negotiation, and both parties may achieve mutual gains beyond what they would achieve independently, a win-win scenario.(3) Compatible negotiation: Well, I can tell that the battery life is impressive as well, you can charge it when you go to bed, but I'm sure you never see it die throughout the day even with heavy use. S:

Table 1 :
Negotiation tactics used in dialog pair templates (P=product, F=feature).

Table 2 :
Example of the combination of dialog pairs in a conversation template.If the battery is important for you, we can offer this product: Axon 7 is the same price as OnePlus 3, but it has slightly bigger battery.Yes, that's true.This phone is also a good choice with the one premium hardware, great software and a reasonable price.

Table 3 :
Demographics of study participants.