Yoshihiko Suhara


2022

pdf bib
Comparative Opinion Summarization via Collaborative Decoding
Hayate Iso | Xiaolan Wang | Stefanos Angelidis | Yoshihiko Suhara
Findings of the Association for Computational Linguistics: ACL 2022

Opinion summarization focuses on generating summaries that reflect popular subjective information expressed in multiple online reviews.While generated summaries offer general and concise information about a particular hotel or product, the information may be insufficient to help the user compare multiple different choices.Thus, the user may still struggle with the question “Which one should I pick?” In this paper, we propose the comparative opinion summarization task, which aims at generating two contrastive summaries and one common summary from two different candidate sets of reviews.We develop a comparative summarization framework CoCoSum, which consists of two base summarization models that jointly generate contrastive and common summaries.Experimental results on a newly created benchmark CoCoTrip show that CoCoSum can produce higher-quality contrastive and common summaries than state-of-the-art opinion summarization models.The dataset and code are available at https://github.com/megagonlabs/cocosum

2021

pdf bib
Extractive Opinion Summarization in Quantized Transformer Spaces
Stefanos Angelidis | Reinald Kim Amplayo | Yoshihiko Suhara | Xiaolan Wang | Mirella Lapata
Transactions of the Association for Computational Linguistics, Volume 9

Abstract We present the Quantized Transformer (QT), an unsupervised system for extractive opinion summarization. QT is inspired by Vector- Quantized Variational Autoencoders, which we repurpose for popularity-driven summarization. It uses a clustering interpretation of the quantized space and a novel extraction algorithm to discover popular opinions among hundreds of reviews, a significant step towards opinion summarization of practical scope. In addition, QT enables controllable summarization without further training, by utilizing properties of the quantized space to extract aspect-specific summaries. We also make publicly available Space, a large-scale evaluation benchmark for opinion summarizers, comprising general and aspect-specific summaries for 50 hotels. Experiments demonstrate the promise of our approach, which is validated by human studies where judges showed clear preference for our method over competitive baselines.

pdf bib
Convex Aggregation for Opinion Summarization
Hayate Iso | Xiaolan Wang | Yoshihiko Suhara | Stefanos Angelidis | Wang-Chiew Tan
Findings of the Association for Computational Linguistics: EMNLP 2021

Recent advances in text autoencoders have significantly improved the quality of the latent space, which enables models to generate grammatical and consistent text from aggregated latent vectors. As a successful application of this property, unsupervised opinion summarization models generate a summary by decoding the aggregated latent vectors of inputs. More specifically, they perform the aggregation via simple average. However, little is known about how the vector aggregation step affects the generation quality. In this study, we revisit the commonly used simple average approach by examining the latent space and generated summaries. We found that text autoencoders tend to generate overly generic summaries from simply averaged latent vectors due to an unexpected L2-norm shrinkage in the aggregated latent vectors, which we refer to as summary vector degeneration. To overcome this issue, we develop a framework Coop, which searches input combinations for the latent vector aggregation using input-output word overlap. Experimental results show that Coop successfully alleviates the summary vector degeneration issue and establishes new state-of-the-art performance on two opinion summarization benchmarks. Code is available at https://github.com/megagonlabs/coop.

2020

pdf bib
OpinionDigest: A Simple Framework for Opinion Summarization
Yoshihiko Suhara | Xiaolan Wang | Stefanos Angelidis | Wang-Chiew Tan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present OpinionDigest, an abstractive opinion summarization framework, which does not rely on gold-standard summaries for training. The framework uses an Aspect-based Sentiment Analysis model to extract opinion phrases from reviews, and trains a Transformer model to reconstruct the original reviews from these extractions. At summarization time, we merge extractions from multiple reviews and select the most popular ones. The selected opinions are used as input to the trained Transformer model, which verbalizes them into an opinion summary. OpinionDigest can also generate customized summaries, tailored to specific user needs, by filtering the selected opinions according to their aspect and/or sentiment. Automatic evaluation on Yelp data shows that our framework outperforms competitive baselines. Human studies on two corpora verify that OpinionDigest produces informative summaries and shows promising customization capabilities.

2019

pdf bib
Open Information Extraction from Question-Answer Pairs
Nikita Bhutani | Yoshihiko Suhara | Wang-Chiew Tan | Alon Halevy | H. V. Jagadish
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Open Information Extraction (OpenIE) extracts meaningful structured tuples from free-form text. Most previous work on OpenIE considers extracting data from one sentence at a time. We describe NeurON, a system for extracting tuples from question-answer pairs. One of the main motivations for NeurON is to be able to extend knowledge bases in a way that considers precisely the information that users care about. NeurON addresses several challenges. First, an answer text is often hard to understand without knowing the question, and second, relevant information can span multiple sentences. To address these, NeurON formulates extraction as a multi-source sequence-to-sequence learning task, wherein it combines distributed representations of a question and an answer to generate knowledge facts. We describe experiments on two real-world datasets that demonstrate that NeurON can find a significant number of new and interesting facts to extend a knowledge base compared to state-of-the-art OpenIE methods.

2018

pdf bib
HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
Akari Asai | Sara Evensen | Behzad Golshan | Alon Halevy | Vivian Li | Andrei Lopatenko | Daniela Stepanov | Yoshihiko Suhara | Wang-Chiew Tan | Yinzhan Xu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)