TransSum: Translating Aspect and Sentiment Embeddings for Self-Supervised Opinion Summarization

In this paper, we propose a novel self-supervised opinion summarization framework TransSum, which models opinion summaries as translations operating on the low-dimensional aspect and sentiment embedding spaces. Speciﬁcally, we propose two contrastive objectives to learn the crucial aspect and sentiment embeddings of reviews, by taking advantage of the intra-and inter-group invariances that have not been considered in previous studies. Furthermore, these embeddings can be used to reduce opinion redundancy and construct highly relevant reviews-summary pairs to train a supervised multi-input opinion summarization model. Experimental results on three different domains show that TransSum outperforms several strong baselines in generating informative, relevant and low-redundant summaries, unveiling the effectiveness of our approach.


Introduction
Opinion summarization, which focuses on automatically generating summaries that reflect salient opinion information expressed in a group of documents (e.g., user reviews of a product in Figure 1), has been receiving great attention due to its usefulness and effectiveness for displaying massive opinion texts (Ku et al., 2006;Cheung et al., 2009;Chu and Liu, 2019). For example, a representative review summary of a product can not only replace large amounts of reviews for potential customers to read, but also provide more explanations than a simple overall sentiment rating, such as " What is the biggest complaint on the iPod 'screen' ?".
However, compared with supervised summarization in the domain of news articles, the annotated training data for opinion summarization is expensive to acquire. Due to the lack of gold-standard summaries for training, most existing works focus on unsupervised opinion summarization and

Positive review
Negative review a A s A … …

Groups of reviews
Yes, HP dvd's are better quality. … I have had a ton a problems with these discs. … The box was too big and allow the spindle to bounce around …

Entity 1
Their Christmas decorations rival that of coach house … Cheap drinks, great happy hour … Cheap drinks, awesome bar staff, stiff pours … Entity 2 Figure 1: The proposed TransSum targets at learning corresponding aspect and sentiment embeddings for reviews (green arrows) through contrastive learning based on the aspect and sentiment invariances (blue arrows). These embeddings are used to construct reviewssummary pairs of high relevance (red arrows), so as to train a supervised multi-input opinion summarization model. Best view in color.
treat it as a normal multi-document summarization task. They either struggle to reduce the opinion redundancy efficiently or output summaries lacking relevance to input reviews. Particularly, many previous studies focus on extractive approaches (Paul et al., 2010;Fabbrizio et al., 2014;Rossiello et al., 2017;Narayan et al., 2019), which copy texts from the input reviews but tend to be redundant and less informative (Chu and Liu, 2019). Some recently proposed abstractive methods are based on unsupervised representation learning, such as autoencoder (Chu and Liu, 2019;Amplayo and Lapata, 2019;Brazinskas et al., 2020a) or variational autoencoder (Brazinskas et al., 2020b;, but mainly focus on the content transformation within each group of reviews. Other studies aim to create synthetic reviews-summary pairs to train a supervised multi-document summarization model (Amplayo and Lapata, 2019; Brazinskas et al., 2020b;Amplayo et al., 2021), such as sampling a review from a corpus of product reviews and treating it as a summary of the remaining reviews, but such settings may lack rationality to guarantee the relevance of reviews and constructed pseudo-summaries.
In an effort to overcome these challenges, we propose a novel self-supervised framework for opinion summarization, TransSum, which consists of two main modules and does not require any gold summaries for training. (i) In the translation-based review modeling module, we expect to represent reviews with only their corresponding aspect and sentiment embeddings (as shown in Figure 1) with the purpose of reducing unnecessary information. We decompose each review into the aspect and sentiment embeddings through reconstruction and contrastive learning (van den Oord et al., 2018;He et al., 2020) based on two novel intra-and intergroup invariances: First, the real-world reviews in a group may discuss various opinions covering different aspects, but they are dependent with a specific entity (e.g., reviews about a specific product). Hence, the aspect information of the reviews in the same group should be closer than that of different groups (i.e, aspect invariance in Figure 1), that is, the distances between intra-group reviews should be less than the ones between inter-group reviews in the aspect embedding space. Second, the sentiment information of reviews with the same sentiment label should be closer than that of different sentiment labels (i.e, sentiment invariance in Figure 1), that is, the distances between reviews with the same sentiment should be less than the ones between reviews with different sentiments in the sentiment embedding space. (ii) In our multi-input opinion summarization module, we reduce opinion redundancy by combining similar embeddings, and use reviews with similar aspects (embeddings) to construct reviews-summary pairs of high relevance, which are used to train a supervised multi-input summarization model.
We conduct extensive experiments to show the superiority of our method. Experimental results on three different domains show that our method outperforms several strong baselines in generating informative, relevant, low-redundant and fluent summaries. We also perform ablation studies to analyze the effectiveness of the modules in our method.
In summary, our main contributions are: • To the best of our knowledge, we are the first to generate opinion summaries from only the aspect and sentiment embeddings, which unlocks the critical bottleneck for unsupervised opinions modeling and takes a step forward towards more complex and controllable designs.
• We propose a novel self-supervised framework (TransSum) to generate opinion summaries without access to expensive annotations by disentangling reviews into aspect and sentiment embeddings and automatically constructing highly relevant reviews-summary pairs for model training.
• Experimental results on three domains show that our approach outperforms several strong baselines, especially in terms of relevance and non-redundancy.

Overview
As aforementioned, a good opinion summary needs to cover major opinions/sentiments on different aspects of the entity (e.g., a movie, product, business) discussed in a group of reviews. Inspired by this observation, we propose a self-supervised framework (titled TransSum), aiming to generate opinion summaries without access to expensive annotations by interpreting them as translations operating on the aspect and sentiment embeddings. As noted in a recent theoretical model of importance in summarization (Peyrard, 2019), a good summary should meet three requirements: (i) minimum redundancy, (ii) maximum relevance with the input document(s), and (iii) maximum informativeness. Based on the observation that reviews are usually created to express users' sentiments on certain aspects of a specific entity (e.g., the price and battery of a PC), we reasonably define informativeness, the amount of new information contained in the opinion summary relative to the background knowledge, as the aspect and sentiment information. The purpose is to reduce unnecessary information in the opinion summary, such as personal information or other irrelevant details.
Specifically, TransSum consists of two main components: (1) A translation-based review modeling module that learns only aspect and sentiment embeddings from each review for opinion summarization, to keep only the key and useful information (requirement iii). The aspect and sentiment embeddings of reviews are learned through reconstruction and two contrastive objectives, which take advantage of aspect and sentiment invariance of intra-and inter-group reviews (detailed in Sec 2.3).
(2) A multi-input opinion summarization module that learns to generate the summary from the redundancy-reduced combination of the aspect and sentiment embeddings of input reviews (requirement i). It is trained by synthetic reviews-summary pairs of high relevance (requirement ii), which are constructed based on the assumption that reviews with the same aspect information (embeddings) are likely to express similar opinions (detailed in Sec 2.4).

Notations
More formally, let D denote a review corpus in a domain (e.g., Products' reviews), which consists of m groups of reviews. For each group G, we assume that it contains n reviews {r 1 , · · · , r i · · · , r n } about a specific entity e (e.g., a product), n is not a fixed number. For each review r i in G, we define its number of tokens r i as |r i |, that is, r i = {r (1) i , · · · , r (|r i |) i }, and use r −i = {r 1 , · · · , r i−1 , r i+1 , · · · , r n } to represent the remaining n − 1 reviews. Each review has a binary sentiment label x (i.g., positive or negative), which indicates the overall sentiment polarity of the review. The aspect and sentiment embeddings of r i are denoted as a i ∈ R |r i |×k and s i ∈ R |r i |×k . E and D are encoder and decoder respectively.
The goal of opinion summarization is to generate a summary y that covers opinions mentioned in the group of reviews, in other words, y can be considered "a representative review" that can replace the group of reviews {r 1 , · · · , r i · · · , r n } in terms of informativeness. Note that we cannot access gold-standard opinion summaries for each group of reviews, as the human-annotated summaries do not exist in most domains.

Translation-Based Review Modeling
The translation-based review modeling module aims to learn aspect and sentiment embeddings for reviews (the left block in Figure 2).
For each review r i in the group G, we encode it using a Transformer (Vaswani et al., 2017) encoder E, and the output encoding h i ∈ R |r i |×k is: where k is the embedding dimension. Inspired by Zhong et al. (2019), we initialize the token embeddings of E with the ones of the BERT-base model (Devlin et al., 2019). Then we use projection matrices A a ∈ R k×k and A s ∈ R k×k to project h i to the aspect and sentiment spaces as a i and s i (the blue and red squares in Figure 2), respectively.
For later use, we further denoteâ i = to represent the mean vectors of the embeddings, repectively.
Translation-Based Reconstruction: We assume that each review is "an opinion summary" of the user's intention and attitude, and model the review as the translation from aspect and sentiment embeddings, that is, c i = a i +s i (the yellow square in Figure 2): (4) To maximize informativeness and reduce unnecessary information, we hope to reconstruct r i from only the embeddings c i with a decoder D. The reconstruction loss is: where is the cross entropy loss (de Boer et al., 2005) and D is a Transformer (Vaswani et al., 2017) decoder with cross-attention on c i . Following previous arts (Amplayo et al., 2021;ElSahar et al., 2020), we adopt label smoothing method (Szegedy et al., 2016) on r i instead of computing with categorical distributions. Contrastive Learning of Aspect and Sentiment Embeddings: We perform contrastive learning to learn the aspect and sentiment embeddings, based on the following two contrastive objectives: (i) the aspect embeddings of intra-group reviews should be "closer" to each other than the ones of inter-group reviews, and (ii) the sentiment embeddings of reviews with the same sentiment label should be "closer" to each other than the ones of reviews with different sentiment labels, even if they are in different groups.
More concretely, for the aspect embedding a i , we except to make its similarity with a "similar" sample a + i far greater than the one with a "dissimilar" sample a − i , that is, Review Summary Figure 2: Architecture of TransSum, which consists of two main components: (1) a translation-based review modeling module that learns aspect and sentiment embeddings, and (2) a multi-input opinion summarization module that learns to generate summaries that are low-redundant and highly relevant to the input reviews. The encoder and decoder are shared, and the red arrows indicate the data flow in the inference phase. Best view in color.
Sim(â i ,â i − ). a + i is the aspect embedding of a review sampled from the same group, and a − i is the aspect embedding of a review sampled from other groups. In this work, we use the dot product between embeddings to measure similarity (i.e., Sim), which can be regarded as a measure of the angle between the two embeddings in the vector space. As a consequence, the aspect-based contrastive objective is: As for the sentiment embedding s i , the "similar" sample s + i is the sentiment embedding of a review sampled from different groups but with the same sentiment label, and the "dissimilar" sample s − i is the sentiment embedding of a review sampled from different groups and with a different sentiment label. Hence, the sentiment-based contrastive objective is defined as follows: To the best of our knowledge, we are the first to go beyond the intra-group information modeling by further considering the inter-group level contrastive learning of aspect and sentiment embeddings.
To further enlarge the disagreements among the aspect/sentiment projection matrix and reduce the redundancy of parameters, we additionally add a regularization loss to encourage uniqueness: where I is the identity matrix.

Multi-Input Opinion Summarization
After learning aspect and sentiment lowdimensional embeddings of reviews, we can construct reviews-summary pairs of high relevance based on the similarity of aspect embeddings, so as to train a supervised multi-input opinion summarization model (the right block in Figure 2). Although real-world reviews in a group discuss various viewpoints covering different aspects under consideration, they are in fact focused on the same entity. In other words, they may repeat discussions about certain aspects many times, and may also include their own unique aspects. However, opinions on the same aspects are likely to be the same in real scenarios, e.g., knowing that most users complain about the high price of a product, the next price-focused review is likely to give a negative view. Based on this observation, we expect to reduce redundancy in similar aspects and use reviews with similar aspects to construct a high-quality data set whose reviews-summary data pairs are highly relevant.
High-Relevance Dataset Creation: We expect to find a subset of r −i in which reviews are similar to r i in the aspect embedding space, and use r i as the target (pseudo) opinion summary of this subset of reviews. We have noticed that in real reviews, the majority of views on the same aspect are consistent with each other, so we believe most reviews-summary pairs created in this way can be used for training a model to capture and summarize the major opinions of the input reviews. More sophisticated ways for dataset creation will be left for further study.
In practice, we assign a weight w to each review in r −i , that is, assigning small values to low-relevance reviews instead of looking for a subset of only high-relevance reviews (as shown by the yellow arrows in Figure 2). For each review r j in r −i , we calculate the distance between it and r i in the aspect embedding space as: where Then we construct the reviews-summary pair < r −i , r i > with the following weights w −i , which will be used later: Note that some previous arts (Brazinskas et al., 2020b;ElSahar et al., 2020;Brazinskas et al., 2020a;Amplayo et al., 2021) adopted a leave-oneout self-supervision setting (Besag, 1975) similar to ours. But they did not take into account the relevance between each review and the pseudo summary, which can be considered as our special case with a uniform distribution w −i = ( 1 n−1 , · · · , 1 n−1 ).

Embedding-Based Redundancy Reduction:
Aside from creating a high-relevance synthetic dataset, we can use the learned embeddings to reduce redundancy. We regard the embedding differences of different reviews as their natural variation, and perform a weighted pooling operation to remove redundant information (similar embeddings). Therefore, we obtainĉ −i based on multiple inputs {w 1 c 1 , · · · , w i−1 c i−1 , w i+1 c i+1 , · · · , w n c n }. Note that w is a uniform distribution in the inference phase, that is, the weight of each input review is equal.
Finally, we generate the opinion summary of r −i and the summarization loss L sum is: where D is the decoder shared with the previous module and we also adopt label smoothing technique (Szegedy et al., 2016) on r i .

Training
Finally, we optimize the sum of the above losses: We also explore non-equal weighting of the losses but do not find a meaningful difference in outcomes. We perform beam search decoding in the inference stage.   (3) Rotten Tomatoes (RT) (Wang and Ling, 2016) which has a large set of reviews for various movies written by critics 3 . The detailed statistics of the three datasets are shown in Table 1. For Yelp and Amazon, there are no gold standard summaries for large training corpora, but the small development and test sets have summaries written by Amazon Mechanical Turk (AMT) crowd-workers. In RT, each set of reviews has a gold-standard opinion summary written by an editor, but we do not use ground truth summaries for training due to the unsupervised setting. Note that all reviews have a binary sentiment label (e.g., positive or negative). For Yelp and Amazon which have 1-5 scale ratings, we mark reviews with scores below 3 as negative and the rest as positive. The implementation details of our method are shown in supplementary materials. Compared Methods: We compare TransSum with several state-of-the-art unsupervised summary generation methods, and some of them can be essentially considered as special cases of our method.
For extractive systems where summaries are created by selecting a subset of salient sentences from the input reviews, they include: (1) LexRank (Erkan and Radev, 2004), a PageRank-like algorithm which selects the review closest to the centroid of a group as the summary; (2) W2VCent, a centroid-based multi-document summarization method that uses word embeddings (Mikolov et al., 2013) instead of TF-IDF to represent each sentence (Rossiello et al., 2017); and (3) Multi-Lead-1 (See et al., 2017) which constructs the summary by selecting the leading sentences from each review of a group. Additionally, we also report the upper bound of extractive methods, i.e., the highest-scoring review in a group when computing ROUGE-L (Lin and Hovy, 2003) against reference summaries.
We also compare with six state-of-the-art abstractive models where summaries are generated from scratch, including: (1) Opinosis (Ganesan et al., 2010), a graph-based method that uses token-level redundancy to generate summaries; (2) MeanSum (Chu and Liu, 2019), an auto-encoder that generates summaries by reconstructing the mean of review encodings, which is in fact special cases of our method without contrastive transformations of aspect and sentiment embeddings and high-relevance dataset creation; (3) Opin-ionDigest , a combination of an aspect-based sentiment analysis model and a phrase-to-review seq2seq model, which can be seen as using opinion phrases to model summaries rather than using the aspect and sentiment embeddings as we do; (4) DenoiseSum (Amplayo and Lapata, 2020), which create a synthetic dataset by treating a review and its noisy versions as the summary and pseudo-review input, instead of using the aspect similarity of real-world reviews like ours; (5) CopyCat (Brazinskas et al., 2020b), a hierarchical variational auto-encoder which learns a latent code of the summary and uses a leave-one-out selfsupervision setting, and it can be regarded as a special case where TransSum does not consider the relevance of input reviews and the constructed summaries; and (6) PlanSum (Amplayo et al., 2021), which uses adversarial learning to learn the aspect and sentiment distributions of reviews, instead of the intra-and inter-group contrastive transformations we use. Note that we do not compare with methods using gold summaries, such as Brazinskas et al. (2020a).

Automatic Evaluation
For automatic summary evaluation, we report the classical ROUGE (Lin and Hovy, 2003) scores on test sets. We report F-measure scores of ROUGE-1 (R1), ROUGE-2 (R2) and ROUGE-L (RL) in the experiments. Table 2 contains the automatic evaluation results on three different datasets. From the results, we can see that: (1) Although extractive methods (e.g., LexRank, W2VCent and Multi-Lead-1) achieve comparable results, their upper bounds are affected by the data sets used. For example, the upper bound results of R2 and RL on Yelp are much lower than the other two, perhaps because most sentences on the Yelp dataset contain more redundant information. (2) Among abstractive models, OpinionDigest and CopyCat perform much better than Opinosis and MeanSum, showing the effectiveness of using opinion phrases or specific distributions to model opinion summaries. But our method surpasses them by a wide margin, indicating that the aspect and sentiment embeddings learned by contrastive learning are beneficial for modeling opinion summaries. (3) Impressively, we observe a large improvement brought by the creation of synthetic datasets (i.e., DenoiseSum, CopyCat and PlanSum), showing the usefulness of using reviews as pseudo-summaries. However, our method is superior to them, illustrating the importance of considering the relevance of the constructed reviews-summary pairs. (4) Overall, our model outperforms all baseline models on three datasets over all three metrics. It is also worth noting that TransSum even surpasses the upper bound of extractive methods on Yelp with an increase of 5.55, 2.3, and 2.2 points in ROUGE-1/2/L.

Human Evaluation
Further, we conduct a human evaluation to evaluate the quality of generated summaries more accurately.
We focus on five criteria: (1) the aspect-based informativeness indicator (Aspect) focuses on whether the summary covers common aspects discussed in the reviews, (2) the sentiment-based informativeness indicator (Sentiment) focuses on whether it agrees with their overall sentiment about different aspects. (3) the relevance indicator (Relevance) reflects whether the summary is relevant to the input reviews, (4) the non-redundancy indicator (Non-Redundancy) measures whether the summary contains unnecessary repetition, and (5) the fluency indicator (Fluency) shows whether the summary is well-formed and grammatical. We show the detailed questions in supplementary materials. We sampled 50, 32, and 50 review groups  Table 2: Automatic evaluation results on three datasets. We make the best results bold, and use "-" to indicate unreported results or unfound outputs. " * " means that the improvements over PlanSum are statistically significant with p-value ≤ 0.05 for t-test, and "Abstr.?" indicates whether the method is an abstractive approach.  from the Yelp, Amazon, and RT test sets with human-annotated summaries, respectively. Then we employ five graduate students to evaluate each tuple containing summaries from LexRank (strong extractive baseline), PlanSum (strong abstractive baseline), TransSum (Ours) and the gold-standard summaries according to the criteria. Note that the order in which the summaries are presented to the judges is random. We use Best-Worst Scaling (Louviere et al., 2015), which has been shown to produce more reliable results than ranking scales (Kiritchenko and Mohammad, 2016). Specifically, each score is computed as the percentage of times it was selected as best minus the percentage of times it was selected as worst, and ranges from -1 (unanimously worst) to +1 (unanimously best). The results are shown in Table 3. As shown, summaries generated by TransSum have better aspect-  and sentiment-based informativeness, indicating that our model can effectively capture the salient opinion information. We find that extractive summaries tend to be more general or even irrelevant (e.g. LexRank on Yelp), but our model performs very well in terms of relevance. Our method also excels baselines in non-redundancy and fluency, showing that summaries generated TransSum are low-redundant and fluent. We show examples of generated summaries of our model and comparison systems in supplementary materials.

Loss Effectiveness Analysis
We present in Table 4 various ablation studies on the three datasets, which assess the contribution of different losses. We report the ROUGE-L score on test sets. Compared to the only reconstruction loss (i.e., row#1), the contrastive learning of aspect and sentiment embeddings (i.e., row#2 and row#3) can bring improvements of 1.27/0.67/0.57 and 1.18/0.59/0.37 points on three datasets, respectively. From row#5 and row#4, we observe that the reconstruction and  regularization losses are also useful for improving results. The last row shows that all our proposed losses in TransSum are helpful, especially L asp and L sen , demonstrating the effectiveness of our model.

Module Effectiveness Analysis
To investigate the importance of the model's individual components, we perform ablations by removing the initialized BERT embeddings, label smoothing, beam search, the translation-based review modeling module, and weighted pooling operation (i.e., w −i is a uniform distribution). From the results in Table 5, all components play a role, yet the most significant drop (i.e., row#5) in ROUGE-L when the translation-based review modeling module is removed, demonstrating the great effectiveness of the aspect and sentiment embeddings learned through contrastive learning. Interestingly, even without learning aspect and sentiment embeddings, using high-relevant reviews-summary pairs created by only entangled representations (i.e., row#5) can also achieve competitive results. This further shows the importance of considering the relevance between reviews and pseudo-summaries.

Related Work
Unsupervised Opinion Summarization aims to automatically generate summaries for a group of opinions about a specific entity (e.g., user reviews of a product), and does not require any gold summaries (Ku et al., 2006;Kim et al., 2011;Chu and Liu, 2019). Most previous works focus on extractive approaches, which select a subset of salient sentences from the inputs based on topic-words (Paul et al., 2010;Fabbrizio et al., 2014), wordfrequencies (Erkan and Radev, 2004;Nenkova and Vanderwende, 2005), word embeddings (Rossiello et al., 2017) or textual graphs . However, due to their shortcomings of copying text from the input (Banko and Vanderwende, 2004), studies of abstractive summarization methods have increased tremendously (Ganesan et al., 2010;Perez-Beltrachini et al., 2019;Zou et al., 2020;Mukherjee et al., 2020). Most of these abstractive works model the problem of opinion summarization as a normal multi-document summarization task, using an auto-encoder framework with attention (Chu and Liu, 2019;Amplayo and Lapata, 2019;Brazinskas et al., 2020a), variational distributions (Brazinskas et al., 2020b;, or abstract meaning representations (Liu et al., 2015). Few of them pay attention to the opinion information, and model the opinion summary with opinion phrases  or the aspect and sentiment distributions (Amplayo et al., 2021). To the best of our knowledge, we are the first to model opinion summaries with only aspect and sentiment embeddings, which are learned through two novel contrastive objectives based on the aspect and sentiment invariances.
Our work is also related to contrastive learning, which a popular unsupervised learning paradigm in the field of computer vision and speech, aiming to enlarge the embedding disagreements of different instances for representation learning (van den Oord et al., 2018;Ye et al., 2019;He et al., 2020). Although there have been studies using contrastive learning for summary evaluation , to our best knowledge, we are the first to use the contrastive transformation on natural textual samples to directly help summary generation, and open the door to research on modeling opinion summaries with aspect and sentiment embeddings.

Conclusion
In this paper, we propose a novel self-supervised framework TransSum, to generate opinion summaries with only the aspect and sentiment embeddings, which are beneficial for maximizing informativeness, reducing redundancy of repeated opinions in reviews, and creating synthetic datasets of highly relevant reviews-summary pairs for training. Extensive evaluation and ablation studies show our model outperforms competitive systems in generating informative, high-relevant, low-redundant and fluent summaries. We believe that the viewpoint from modeling opinion summaries with only aspect and sentiment embeddings proposed in this study may pave a new way to design more complex and controllable systems for unsupervised opinion summarization.

Acknowledgements
This work was supported by National Natural Science Foundation of China (61772036), Beijing Academy of Artificial Intelligence (BAAI) and Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology). We would like to appreciate the anonymous reviewers for their helpful comments. Xiaojun Wan is the corresponding author. 1. i got the roast duck won ton noodles . the noodles were good and firm and the wontons were 100 % shrimp which was very good . the roast duck and roast suckling pig was bland . but not bad for the prices .
2. lost on the shuffle . lucky i had extra time . ordered soup noodles with beef brisket and tendon . normally very fast as everything is already cooked . they had to ask me about my order 3 times after i asked them to check on it after waiting for 25 mins . when the food finally came , taste was good and portion was pretty big . beef tendon and brisket noodle ... the tendon was sooo soft and gooey .... mmm . but i had to deduct a point for the service mishap . and it wasn ' t busy yet ...
3. always fantastic food with great prices . i went every weekend for a month in the summer . the owners are always friendly . if you are going later in the evening or late , don ' t order the tea ( milk tea ) or coffee . they boil it all day and by then its completely gross . but other wise , i ' ve never had a bad experience here .
4. food portions were small and nothing special . bonus its its open late .
5. -solid chinese eats -it gives me a good feeling when a restaurant is full of people . and this one normally is . ( especially those of the same ethnic background as the cuisine ) -if ever i have a craving for congee or roasted pork on rice , i ' m here . -oh and it ' s mad cheapwhich is a nice bonus . i dig healthy competition .
6. the price point is a little higher than the places i frequent in richmond hill / markham and the selection is smaller , but if i ever craze decent , solid , authentic chinese food when i ' m downtown , i come here ! 7. initially went to chinatown to eat beef brisket noodle soup at kings noodle but they were closed on wednesdays . walked down dundas and found this spot , decided to try it and as really surprised . the noodles were tasted good , much more generous portions compared to kings noodle and they were the same price . would recommend this place ! 8. this is an awesome place you can go for chinatown area . nice service , delicious food , and what you need more ?
Gold service can be a little slow here . the noodles are really good . i think it ' s a bit expensive though for what you get . there are other places that are cheaper but i don ' t know how they taste , so i can only comment on here . it ' s definitely worth checking out though . i had to wait a bit for my food but still pretty good experience .
LexRank food portions were small and nothing special . bonus its its open late . walked down dundas and found this spot , decided to try it and as really surprised . the noodles were good and firm and the wontons were 100 % shrimp which was very good . the roast duck and roast suckling pig was bland . but not bad for the prices .
PlanSum i've been to this place several times and i have never had a bad experience. the food is always good and the service is good. i love the fact that they are open late, so if you're looking for a quick lunch or dinner, this is the place to go.

Ours
the noodles are good but the price is a little expensive . the staff is always helpful and friendly . an awesome chinese eats you can go in the chinatown area . come by yourself ! Table 6: Examples of opinion summaries generated by multiple systems on the Yelp dataset.
15. This is as far from the Poverty Row gasps of The Blair Witch Project as you can get, and more fun.
16. I wouldn't waste more than the price of a video rental on this one.
17. High-tech remake is dumb and overblown.
18. All logic is deadened by the obnoxious special effects! 19. Once the screaming begins, so will your laughing 20. Glossy but lackluster.
21. It's just a conglomeration of cheap fright tactics and a booming bass track meant to get you to jump out of your seat.
22. An exercise in missed opportunities and bad filmmaking! 23. The Haunting is a muddled mess that defies any rationality.
24. The only thing scary about the new version is realizing that someone keeps giving director Jan De Bont money to make movies.
25. The characters are on the dramatic equivalent of Death Row. ......
Gold sophisticated visual effects fail to offset awkward performances and an uneven script .
LexRank the characters are on the dramatic equivalent of death row .
PlanSum the haunting is a very good movie, but it's a lot of fun, and the filmmakers have been raised by the original.
Ours unfortunately , this is one haunting with the obnoxious special effects that are bloated and wretchedly overdone ! Table 8: Examples of opinion summaries generated by multiple systems on the Rotten Tomatoes dataset.